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ABSTRACT 


Conducting  safe  flight  operations  from  aircraft  carriers  requires 
accurate  and  timely  dissemination  of  aircraft  status  information  from 
the  Carrier  Air  Traffic  Control  Center  (CATCC).  Presently,  the  infor¬ 
mation  is  manually  displayed  on  status  boards  throughout  the  ship  by  a 
network  of  sailors  communicating  via  sound -powered  microphones.  A 
prototype,  connected,  speech-based  system,  developed  by  the  Naval 
Ocean  Systems  Command  (NOSC),  was  evaluated.  Specific  evaluation 
criteria  were  the  hardware,  software,  and  the  man-machine  interface. 
The  use  of  connected  speech  as  an  input  modality  across  varying  noise 
and  syntactic  conditions  was  experimentally  tested.  The  result  of  this 
research  was  the  proposal  of  guidelines  for  designing  connected 
speech  syntaxes  and  specific  recommendations  for  future  prototype 
development  efforts. 
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I.  INTRODUCTION 


A.  GENERAL 

Managing,  maintaining,  interpreting,  and  displaying  information  is 
of  critical  importance  to  the  safe  and  efficient  operation  of  aircraft 
from  a  Naval  aircraft  carrier.  The  successful  execution  of  the  carrier’s 
mission  is  largely  dependent  upon  the  ability  to  rapidly  and  safely 
launch,  track,  and  recover  high-performance  aircraft  operating  from 
the  carrier’s  deck.  This  thesis  describes  the  research  and  evaluation 
of  an  automated  information  system  designed  to  improve  the  present 
manual  method  of  maintaining  and  displaying  aircraft  status  informa¬ 
tion  in  direct  support  of  aircraft  launch  and  recovery  operations. 

B.  PROJECT  BACKGROUND 

1.  PttTBMS 

The  Naval  Ocean  System  Command  (NOSC),  located  in  San 
Diego,  California,  developed  a  prototype  information  system  to  replace 
the  current  manual  method  of  maintaining  status  board  information  in 
the  Carrier  Air  Traffic  Control  Center  (CATCC).  The  primary  objective 
is  to  implement  a  system  which  will  automate  the  maintenance,  dis¬ 
play,  and  distribution  of  aircraft  status  information  using  voice  and/or 
keyboard  as  the  input  modality. 

2.  Kcy  Participants 

The  primary  participants  in  the  project  and  their  responsi¬ 
bilities  were: 
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Activity 


Responsibility 


NOSC  (Code  44) 

NPS  (Code  55) 

NavAir 

USS  Constellation 
ITT,  Defense  Comm.  Div. 


System  design  and  development 
Prototype  evaluation 
Functional  management 
Primary  test  site 
Technical  support,  as  requested 


3.  stains 

A  preliminary  functional  description  has  been  developed, 
upon  which  the  prototype  system  is  based.  Software  development  and 
initial  testing  was  conducted  at  NOSC,  San  Diego,  based  on  the  pre¬ 
liminary  design  efforts  conducted  at  that  activity.  Following  initial 
development,  field  testing  and  evaluation  was  conducted  at  the  Naval 
Postgraduate  School  (NPS)  prior  to  full-scale  shipboard  testing. 


C.  SCOPE 

In  coordination  with  the  thesis  advisor,  the  research  domain  was 
limited  to  three  primary  areas  of  interest.  First,  evaluate  the  proto¬ 
type  system  as  delivered  by  NOSC,  San  Diego.  The  specific  purpose  is 
to  objectively  evaluate  the  system  by  gaining  “hands  on”  experience  in 
training,  testing,  and  operation  of  functional  system  components.  The 
second  area  is  to  make  a  general  determination  concerning  the  feasi¬ 
bility  of  automating  the  current  system  using  some  combination  of 
voice  and  keyboard  data  entry  to  a  computer-based  system.  Finally, 
based  on  evaluation  and  empirical  testing,  specific  recommendations 
for  future  project  efforts  are  provided. 

D.  METHODOLOGY 

This  research  was  conducted  using  the  following  approach: 
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1.  Review  voice  recognition  technology. 

2.  Study  the  CATCC  operating  environment. 

3.  Gain  experience  using  the  NOSC  prototype. 

4.  Train  a  small  user  population  on  the  NOSC  system. 

5.  Conduct  an  experiment  to  evaluate  the  Installed  system. 

6.  Analyze  the  results. 

7.  Make  specific  recommendations  based  on  experiences  and  test 
results. 

E.  LIMITATIONS 

The  primary  research  area  is  limited  to  evaluating  the  NOSC  pro¬ 
totype,  as  delivered.  Modifications  by  NPS  were  limited  to  those 
required  to  accomplish  specific  test  objectives.  The  research  is  lim¬ 
ited  in  several  areas.  First,  the  system  was  not,  during  the  course  of 
this  research,  tested  in  an  at-sea  environment.  Second,  the  skill  level 
of  the  test  subjects,  although  familiar  with  CATCC  operations,  is  not 
expected  to  be  at  the  level  of  the  sailors  participating  in  these  opera¬ 
tions  on  a  day-to-day  basis.  Third,  the  system  developed  by  NOSC  is 
designed  to  meet  the  generic  CATCC  requirements.  Operational 
peculiarities  of  a  specific  CATCC  were  not  considered.  Finally,  the 
researchers  were  unable  to  visit  a  CATCC  during  flight  operations  in 
the  conduct  of  the  study.  CATCC-experienced  officers  were  used 
instead  to  provide  a  rudimentary  insight  into  essential  details. 

F.  ORGANIZATION  OF  THE  THESIS 

The  general  organization  of  the  thesis  is  by  major  topical  compo¬ 
nents  which  are  divided  into  distinct  chapters.  Depending  upon  the 
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experience  of  the  reader,  chapters  may  be  omitted  without  loss  of 
continuity.  Each  chapter  will  be  preceded  by  a  chapter  executive 
summary  providing  the  reader  an  opportunity  to  judge  the  contents 
prior  to  reading.  Following  this  brief  introduction.  Chapter  II  presents 
a  primer  on  voice  recognition  systems  written  for  those  unfamiliar 
with  the  technology.  Chapter  III  discusses  the  mission,  organization, 
and  operational  environment  of  a  typical  CATCC.  The  fourth  chapter 
introduces  the  NOSC  prototype  system,  as  delivered  to  NPS.  System 
Testing  may  be  found  in  the  fifth  chapter.  Finally,  Chapter  VI  con¬ 
tains  recommendations  and  conclusions. 
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This  chapter  is  a  basic  introduction  to  a  variety  of  voice  technolo¬ 
gies  and  techniques.  Specific  topics  discussed  include  how  speech 
recognizers  work,  categories  of  speech  recognition,  typical  applica¬ 
tions,  design  criteria  and  a  tutorial  on  the  development  of  connected 
phrase  syntaxes. 

A.  SPEECH  RECOGNITION  TECHNOLOGY 
1.  Speech  Composition 

Human  speech  is  a  complex,  well-defined  process  of  convey¬ 
ing  information.  The  process  starts  with  the  brain,  which  sends  sig¬ 
nals  to  those  muscles  and  organs  used  to  make  speech.  The  formation 
of  speech  sounds  then  occurs  and  the  process  ends  with  interpreta¬ 
tion  by  the  listener.  This  section  will  provide  a  basic  foundation  for 
understanding  the  way  speech  is  formed,  the  composition  of  the 
speech  signal,  and  the  informational  components  of  speech. 

The  physical  process  of  communicating  is  achieved  by  the 
interaction  of  lips,  tongue,  and  teeth.  Five  types  of  speech  sounds 
articulated  in  English  are:  [Ref.  l:p.  13] 

1.  Plosives  which  are  sounds  created  by  stopping  the  passage  of  air. 
An  example  is  the  letter  “t"  in  the  word  “top." 

2.  Fricatives  are  caused  by  forming  a  narrow  passage  through  which 
air  may  pass.  The  diphthong  “th"  in  the  word  “their"  is  an 
example. 
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3.  Laterals  are  sounds  formed  when  the  tongue  touches  the  roof  of 
the  mouth.  An  example  is  the  “1"  in  “launch." 

4.  Trills  are  caused  by  the  rapid  vibration  of  one  of  the  articulators 
(lips,  tongue,  etc.).  The  letter  “r"  is  a  trill  sound  in  some 
languages. 

5.  Vowels  are  those  sounds  made  when  unobstructed  air  passes 
over  the  vocal  cords. 

Human  speech,  then,  consists  of  strings  of  phonemes,  which 
are  the  atomic  units  of  sound.  Most  spoken  languages  require  between 
20  and  60  phonemes  [Ref.  2:p.  128).  Table  2.1,  adapted  from  Refer¬ 
ence  2,  p.  127,  contains  the  phonemes  typically  associated  with 
English.  Analysis  of  the  phonemes  required  for  a  word  viewed  in 
isolation  is  not  sufficient  because  word  sounds  change  depending  upon 
the  location  within  a  string  of  words.  A  language's  phonological  rules 
govern  the  phonemes  associated  with  a  specific  word  depending  upon 
the  other  sounds  immediately  preceding  and  following  the  word. 


TABLE  2.1 

ENGLISH  PHONEMES 


beat 

bit 

bait 

bet 

tiat 

Bab 

but 

batter 

bought 

boat 

bg&k 

boot 

about 

roses 

bird 

down 

bm 

bay 

you 

wit 

cent 

let 

met 

net 

sing 

net 

ten 

kit 

bet 

debt 

get 

hat 

fat 

iking 

sat 

shut 

vat 

that 

200 

ggure 

church 

judge 

which 

battle 

bottom 

button 


Speech  understanding  is  not  based  on  word  sounds  alone. 
Understanding  requires  not  only  knowledge  about  what  was  said  but 
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also  how  It  was  said.  Hearing  phonemes  Is  the  basis  for  what  was  said. 
Interpreting  the  stress,  tempo,  placement,  and  duration  of  pauses  and 
intonation  implies  how  it  was  spoken.  This  process  is  termed 
prosodies.  An  example  would  be  understanding  the  implication  of  the 
following  sentences: 

“I  can  see  a  head."  vs.  “I  can  see  ahead." 

The  sentences  contain  identical  sounds,  yet  the  prosodies  of  speech 
avoids  the  obvious  ambiguity  caused  if  pauses  were  not  considered  in 
the  interpretation  of  what  was  said.  Frequently  though,  prosodies 
alone  is  insufficient  for  understanding,  as  in  the  case  of  poor  enuncia¬ 
tion.  Resolution  of  ambiguity  may  also  involve  an  understanding  of  the 
context  in  which  a  phrase  was  spoken,  which  is  termed  pragmatics. 

Human  speech  is  also  governed  by  a  structure  we  know  as 
grammar.  The  grammatical  structure  is  represented  by  a  syntax. 
English  syntax,  for  example,  requires  a  proper  sentence  to  be  com¬ 
posed  of  a  noun  and  a  verb  phrase.  The  syntactic  rules,  in  conjunction 
with  prosodies,  govern  how  an  utterance  may  be  correctly  spoken. 
Linguistic  theory  suggests  the  more  complex  the  syntactic  constructs, 
the  more  powerful  the  language. 

The  human  process,  then,  of  semantic  analysis  of  speech  is 
reliant  upon  not  only  hearing  the  strings  of  phonemes  but  also  using 
the  prosodies,  pragmatics,  and  syntax  of  the  language  in  order  to 
understand  not  only  what  was  said  but  also  what  was  meant.  This  abil- 
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ity  allows  us  to  uniquely  process  phrases  such  as  “up  in  arms"  and 
“over  the  hill." 

Depending  upon  the  application,  speech  systems  may  offer 
varying  degrees  of  sophistication— from  the  simple  phoneme  inter¬ 
preter  (an  isolated  word  recognizer)  to  a  system  capable  of  resolving 
prosodic  and  semantic  ambiguity  (a  natural  language  processor). 

2.  Speech,  Analysis 

Understanding  how  speech  is  analyzed  by  a  machine  is  sim¬ 
plified  by  developing  parallels  between  the  more  familiar  human  pro¬ 
cess  and  the  unfamiliar  machine  process.  Figure  2.1  diagrams  the 
fundamental  components  of  any  speech  analyzer.  A  Knowledge  Source 
is  the  relative  maturity  of  the  system,  human  or  machine.  Just  as  chil¬ 
dren  can  be  “programmed”  to  understand,  so  can  a  machine.  The 
sophistication  or  robustness  of  a  speech  analyzer  then  is  directly 
related  to  its  ability  to  process  the  variety  of  speech  information 
(phonological  rules,  prosodies,  syntax,  and  pragmatics). 


Figure  2. 1 

Speech  Recognition  Process 
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A  fundamental  algorithm  for  understanding  what  was  said  is 
found  in  Figure  2.2  (Ref.  3:p.  5051.  This  is  a  classic  speech  s  gnal  anal¬ 
ysis  algorithm  that  most  processors  use,  regardless  of  the  technology 
involved.  Conversion  of  the  human  analog  signal  to  a  discrete  digital 
signal  in  a  machine-acceptable  format  is  the  first  step.  Once  the  signal 
has  passed  through  the  Analog-to-Digital  converter,  an  attempt  is 
made  to  bound  the  signal.  Accurate  detection  of  the  boundaries  of  a 
signal  is  essential  if  recognition  is  to  be  achieved.  Because  the  entire 
spectrum  of  the  signal  may  not  be  required,  an  algorithm  is  employed 
to  isolate  the  essential  signal  characteristics.  The  remainder  of  the 
signal  is  discarded  in  a  process  known  as  data  compression.  The 
probability  that  two  utterances  of  a  word  or  phrase  are  identical  is 
remote.  All  recognizers,  then,  must  be  capable  of  eliminating  slight 
variances  in  speech,  pitch,  intonation,  and  pause  length.  The  filtering 
or  “normalizing”  process  allows  for  a  range  of  signal  variability.  The 
more  robust  the  recognizer,  the  greater  the  variance.  Depending  upon 
the  mode  (learning  or  recognition),  an  attempt  is  made  to  either  add 
the  signal  to  a  vocabulary  or  match  the  sound  against  an  existing 
vocabulary. 

Algorithms  used  to  match  the  signal  have  been  a  major 
research  area,  with  increasing  both  speed  and  accuracy  a  primary  goal. 
Generally,  though,  matching  is  achieved  by  comparing  distances 
between  the  incoming  pattern  and  some  previously  stored  reference 
pattern.  The  pattern  with  the  minimum  distance  is  judged  the 
winner. 
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Figure  2.2 

Speech  Recognition  Components 

3.  Categories  of  Recognizers 

Research  and  commercial  endeavors  have  combined  to 
develop  a  variety  of  recognizers,  which  are  designed  to  satisfy  specific 
application  requirements.  Table  2.2  [Ref.  3:p.  5031  compares  and 
contrasts  in  simple  terms  the  functionality  of  some  of  the  most  com¬ 
monly  found  voice  recognizer  types.  Two  points  to  understand  when 
evaluating  any  speech  recognition  system  are  the  degree  of  speaker 
independence  and  how  utterances  are  parsed. 
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TABLE  2.2 

VOICE  RECOGNITION  CATEGORIES 


Category 

HHHHH 

Word  Recognition  (WR) 

Isolated 

10 ->300 

command-like 

Connected  Speech, 
Restricted  (CSR) 

Connected 

30 ->500 

restricted  command 
language 

Speech  Understanding 
(SU) 

Connected 

100 ->2,000 

English-llke 

Unrestricted  Speech 
Understanding  (USU) 

Connected 

1,000  ->  10,000 

English-llke 

Unrestricted  Speech 

Connected 

Unlimited 

English 

Speech  systems  today  are  either  speaker  independent  or 
speaker  dependent.  The  more  common,  speaker-dependent  systems 
require  the  user  to  pre-train  the  system  prior  to  use.  Training  typi¬ 
cally  involves  creating  a  personal  template  signal  for  each  word  in  the 
vocabulary.  Creating  a  personal  speech  template  for  each  word  in  the 
vocabulary  ensures  consistent  input  will  be  acceptable  regardless  of 
individual  speaker  characteristics.  Unfortunately,  for  connected 
speech  systems  with  large  vocabularies,  this  could  become  a  time- 
consuming  process.  Speaker-independent  systems  employ  a  standard 
template  against  which  all  speech  is  compared.  The  cost  is  generally  a 
more  restricted  vocabulary  and  lower  overall  recognition  rates. 

Utterance  parsing  governs  how  the  recognizer  algorithm  will 
dissect  the  utterance.  In  isolated  systems  the  recognizer  has  no  syn¬ 
tactic  knowledge  source,  thus  each  utterance  is  viewed  singularly. 
Examples  would  be  the  commands  “ENTER’*  or  “DIAL."  Short  macro 
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phrases  are  also  possible  in  isolated  systems.  For  example,  a  recog¬ 
nizer  could  be  trained  to  recognize  and  subsequently  execute  the 
command  “DIAL  HOME."  Connected  systems,  however,  view  the 
speech  in  terms  of  a  syntax,  thus  strings  of  commands/words  may  be 
spoken  in  a  connected  pseudo-language  that  is  a  subset  of  a  true  lan¬ 
guage  (e.g.,  English)  for  a  particular  environment  (e.g.,  a  CATCC).  An 
example  might  be  the  command  “DIAL  PULSE  FOUR  ZERO  EIGHT 
FIVE  FIVE  FIVE  ONE  TWO  ONE  TWO  LOG  IN  GUEST."  An  extended 
variant  of  connected  systems  are  those  that  recognize  in  a  continuous 
fashion,  typically  according  to  some  natural  language  syntax.  Speech- 
to-text  applications  typically  employ  a  continuous  recognizer.  As  a 
general  rule,  the  more  powerful  and  complete  the  syntax  is,  the  more 
natural  the  Interface  will  be.  Figure  2.3  summarizes  the  key  differ¬ 
ences  between  the  competing  approaches. 


ISOLATED  CONNECTED 


•  simple  to  implement 

•  Increased  training 

SPEAKER- 

•  low  hardware  cost 

•  short  phrases  to 

DEPENDENT 

•  restricted  to  isolated  utterances 

natural  language 

•  high  recognition  rates 

•  based  on  syntax 

•  limited  application 

•  most  natural;  powerful 

SPEAKER- 

•  small  vocabulary 

•  response  could  be  slow 

INDEPENDENT 

•  variable  recognition  rate 

•  recognition  rates  highly 

variable 

Figure  2.3 

Speech  Recognition  Trade-Offs 


S 
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B.  SPEECH  APPLICATIONS 
1.  fi£BS£al 

A  parallel  may  be  drawn  between  the  development  of  tele¬ 
graph/telephone  systems  and  computers  in  general.  When  electronic 
communications  was  initially  made  possible,  the  input  modality  was  via 
a  key  contact  that  transmitted  a  code  representing  a  letter.  Telegraph 
poles  quickly  out-paced  the  rival  pony  express  and  so  this  primitive 
keyboard  became  a  means  of  communication  within  a  system.  The 
discovery  by  A.  G.  Bell  that  human  speech  could  also  be  transmitted  via 
wire  caused  the  replacement  of  a  keyboard  with  a  voice-actuated 
receiver/transmitter  as  the  primary  means  of  transmitting  short-dura¬ 
tion  messages. 

Why  did  this  occur?  The  primary  reason  is  that  despite  our 
sophistication,  voice  remains  our  most  natural  communication 
medium.  A  prime  example  is  how  the  US  Navy  has  struggled  with 
alternative  mechanisms  for  over  two  centuries:  signal  flags  with  coded 
meanings,  flares,  and  signal  lights.  But.  given  a  choice,  man  generally 
prefers  voice  communication.  Keyboards  are  an  outgrowth  of  the 
typewriter  and  telegraph  technologies  but  they,  too,  are  limited  by  the 
skills  of  the  operator. 

Numerous  studies  have  shown  that  voice  recognition  systems 
are  faster  and  more  accurate  than  most  manual-entry  systems.  Addi¬ 
tionally,  voice  systems  free  the  operator’s  eyes  and  hands  to  accom¬ 
plish  concurrent  tasks.  Unencumbered  by  a  keyboard  or  a  mouse,  the 
operator  is  generally  free  to  move  about  while  speaking  to  the  system. 
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Table  2.3  {Ref.  4:p.  36]  compares  the  relative  advantages  and  disadvan¬ 
tages  of  speech  in  the  military  command  and  control  environment. 

In  general,  applications  that  are  most  likely  to  benefit  from 
voice  recognition  input  are  those  that  have  one  or  more  of  the  follow¬ 
ing  characteristics  {Ref.  2:pp.4-8]: 

1.  Small  working  vocabulary 

2.  Well  structured  syntax 

3.  Operator’s  hands/eyes  otherwise  occupied 

4.  Reduced  lighting  conditions 

5.  Application  requires  other  electronic  communication  (radio, 
telephone,  etc.) 

2.  Cgmmcrclftl  -ABBllcailpag 

With  the  increasing  sophistication  and  decreasing  cost, 
commercial  applications  of  speech  systems  have  surfaced.  The  variety 
of  applications  is  only  limited  by  the  imagination.  But  in  the  commer¬ 
cial  environment,  voice  input  is  generally  being  used  for  one  primary 
purpose:  to  increase  individual  productivity. 

A  typical  commercial  speech  application  is  in  the  area  of 
quality  control  and  inspection.  Such  a  system  has  been  used  by  the 
Owens-Illinois  Corporation  since  1973.  This  isolated  word  application 
starts  with  the  inspector  entering,  via  voice,  general  shift  information, 
employee  number,  and  Item  type  to  be  Inspected.  Then  the  operator 
conducts  the  inspection  (hands  occupied),  calling  out  only  the  essen¬ 
tial  measurements.  In  a  similar  system,  an  automobile  manufacturer 
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TABLE  2.3 


ADVANTAGES  AND  DISADVANTAGES 
OF  SPEECH  I/O  FOR  C2  ADVANTAGES 


ADVANTAGES 

Engineering 

1.  Can  be  faster  than  other  modes  of  communication. 

2.  Can  be  more  accurate  than  other  modes  of  communication. 

3.  Compatible  with  existing  communications  systems. 

4.  Can  reduce  manpower  requirements. 

Psychological 

1.  Most  natural  form  of  human  communications. 

2.  Best  for  group  or  team  problem  solving. 

3.  Universal  (or  nearly  so)  among  humans. 

4.  Can  reduce  visual  information  overload. 

5.  Increase  in  value  when  also  involved  in  cognitive-type  processes. 
Physiological 

1.  Requires  less  effort  and  gross  motor  activity  than  other  modes. 

2.  Frees  hands  and  eyes. 

3.  Permits  multimodal  operation. 

4.  Is  feasible  in  reduced  lighting. 

5.  Permits  operator  mobility. 

6.  Contains  information  about  physical  and  emotional  state  of  speaker. 

DISADVANTAGES 

Engineering 

1.  Interference  from  competing  acoustic  signals. 

2.  Environmental  conditions  can  alter  speech  signal. 

3.  Requires  use  of  microphone,  a  tool  with  which  many  users  may  not  be 
familiar. 

Psychological 

1.  Loss  of  privacy. 

2.  Psychologically  induced  changes  in  speech  characteristics. 
Physiological 

1.  Increased  mental  loading. 

2.  Fatigue  from  prolonged  speaking. 

3.  Temporary  physical  ailments  (e.g..  colds,  etc.)  may  alter  speech 
characteristics. 
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drew  the  following  conclusions  about  voice  input  following  a  two-week 
experiment  (Ref.  5:p.  497]: 

1.  Voice  recognition  accuracy  was  at  an  acceptable  level. 

2.  Minimal  operator  training  (less  than  one  day  )  was  required. 

3.  Using  the  system  did  not  interfere  with  task  performance. 

4.  Operators  were  comfortable  with  the  system. 

5.  A  wireless  microphone  would  allow  complete  operator  freedom. 

Other  commercial  applications  that  have  been  successfully 
installed  include:  voice  applications  of  process  control,  warehousing 
functions,  automated  material  handling,  and  parts  programming  for 
machine  tools  [Ref.  5:pp.  496-500].  Of  particular  importance  are  the 
environments  in  which  these  commercial  systems  have  been  used. 
Commercial  applications  have  not  been  restricted  to  quiet,  stable 
environments  operated  by  a  highly  trained  speech  specialist.  Rather, 
in  many  cases,  these  systems  have  been  successfully  introduced  into 
such  severe  environments  as  airline  baggage  handling  areas,  assembly 
lines,  factories,  and  warehouses. 

3.  Military  Applications 

The  employment  of  speech  in  a  number  of  mission-critical 
military  systems  has  increased  dramatically  in  the  last  decade.  Speech 
recognition  research  and  development  has  been  largely  suppurted  by 
military  organizations.  Military  speech  recognition  research  efforts 
have  been  focused  into  three  primary  areas:  command  and  control 
(C2),  messaging  systems,  and  low-bit  rate  communications  [Ref.  4:p. 
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35].  Because  of  the  applicability  to  our  study,  we  will  focus  our  atten¬ 
tion  on  militaiy  C2  applications. 

Increasing  the  sophistication  of  our  combat  systems  has  not 
come  without  a  price.  The  multifunctional  nature  of  the  typical  opera¬ 
tional  environment  has  dramatically  increased  the  complexity  of  most 
systems  fielded  in  the  last  decade.  For  example,  today’s  high-perfor¬ 
mance  tactical  aircraft  only  remotely  resemble  their  Korean-  and  even 
Vietnam-era  counterparts.  Aircrews  are  challenged  by  the  increased 
complexity  of  the  mission,  which  translates  into  an  increased  number 
of  on-board  systems  requiring  detailed  attention.  Each  new  system 
installed  diverts  the  aircrew’s  attention  from  events  outside  the  cock¬ 
pit  to  those  occurring  inside.  Aircrews  today  are  nearly  saturated  with 
visual,  aural,  and  manual  input  sources. 

In  the  late  1970s,  cockpit  designers  became  aware  of  the 
problem  and  endeavored  to  improve  the  man-machine  interface.  Live 
test  results  illustrated  advanced  avionic  systems  for  displaying  infor¬ 
mation,  heads-up  displays  (HUD),  and  the  use  of  voice  recognition. 
Sorely  needed  improvements  to  cockpit  displays  and  systems  com¬ 
bined  with  the  HUD  allowed  members  of  the  aircrew  to  focus  their 
attention  outside  the  cockpit.  By  using  voice  recognition,  aircrews 
could  query  the  status  of  specific  mission-critical  systems  without 
having  to  reference  cockpit  displays.  These  test  results,  although  not 
currently  standard  practice,  showed  that  pilots  using  isolated  word 
voice  recognition  commands  could  then  aurally  obtain  airspeed,  fuel- 
state,  altitude,  and  ordnance  information. 
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Again,  these  systems  were  employed  in  one  of  the  most 
severe  military  environments.  Designers  have  been  able  to  overcome 
the  combined  effects  of  g-forces,  vibration,  and  the  distortion  caused 
by  oxygen  masks,  successfully  implementing  isolated  word  voice  rec¬ 
ognizers.  Connected  speech  systems  are  the  next  generation  to  be 
installed,  thus  giving  the  aircrew  an  even  more  natural  and  flexible 
interface.  Table  2.4  [Ref.  6:p.  3101  delineates  candidate  military  appli¬ 
cations  of  speech  technologies. 

TABLE  2.4 

POTENTIAL  MILITARY  APPLICATIONS  OF  VOICE  I/O 


1.  SECURITY 

•  Speaker  verification 

•  Speaker  identification 

•  Recognition  of  spoken  codes 

2.  COMMAND  AND  CONTROL 

•  System  control  (displays,  fire  control,  aircraft) 

•  Computer  control 

•  Material  handling 

•  Remote  vehicle  control 

3.  DATA  TRANSMISSION  AND  COMMUNICATION 

•  Speech  synthesis 

•  Scrambling/ Ciphering 

•  Messaging 

4.  PROCESSING  DISTORTED  SPEECH 

•  Diver  speech 

•  Astronaut  communication 

•  Speech  through  protective  or  oxygen  masks 


C.  DESIGN  ISSUES 

In  section  B,  above,  we  highlighted  some  advantages  and 
disadvantages  of  speech  systems.  With  these  in  mind,  we  can  start 
considering  specific  design  issues.  Figure  2.4  is  a  block  diagram  of 
specific  issues  identified  by  Lea  [Ref.  2:p.  83].  We  will  examine  each  of 
Lea’s  issues  in  turn. 


APPLICATION 


H 


HUMAN  FACTORS 


LANGUAGE 


H 


ENVIRONMENT 


H 


PERFORMANCE 


Figure  2.4 

Application  Design  Issues 


Application  issues  must  be  considered,  as  in  the  development  of 
any  system.  These  application  criteria  roughly  equate  to  general 
design  specifications.  For  example,  what  is  the  required  response 
time?  What  is  the  minimum  acceptable  recognition  rate?  How  reli¬ 
able  must  the  system  be  in  terms  of  mean  time  between  failures? 

Human  factors  issues  are  of  primary  concern  in  most  speech  sys¬ 
tems.  If  a  clear  advantage  in  terms  of  ease  of  use,  accuracy,  or  effi¬ 
ciency  can’t  be  shown  over  alternative  modalities,  then  perhaps  voice 
input  is  not  appropriate.  The  designer  must  consider  human  factors 
Issues  associated  with  training  and  the  potential  for  problems  in 
training  users,  particularly  in  a  connected  speech  system,  with  a 
restricted  syntax.  Users,  however,  must  be  aware  that  the  nominal 
time  required  to  train  a  system  is  insignificant  when  compared  to 
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long-term  productivity  growth.  Regardless  of  how  natural  speech  is, 
users  may  still  resist  a  speech  system,  preferring  instead  a  status  quo 
alternative.  Finally,  and  most  importantly,  any  speech  system  must 
integrate  the  user  into  a  well-developed  system  of  displays  with  feed¬ 
back  available  in  both  training  and  recognition  modes. 

Another  design  consideration  of  paramount  importance  is  that  of 
the  language  itself.  For  example,  is  there  a  well-structured  vocabulary 
associated  with  the  application?  The  application  must  also  be  studied 
in  terms  of  the  most  appropriate  class  of  recognition  (isolated,  con¬ 
nected,  continuous).  If  a  connected  or  continuous  system  is  consid¬ 
ered,  the  design  will  require  development  of  an  appropriate  syntax. 

Environmental  conditions  are  of  critical  concern  during  the 
development  of  any  speech  application.  The  obvious  concerns  include 
noise  [Ref.  7],  vibration,  lighting,  and  g-forces  [Ref.  8],  Other  concerns 
might  be  the  impact  of  Electro-Magnetic  Interference  (EMI)  on  the 
channel  itself. 

Two  performance-related  issues  are  recognition  accuracy  and 
recognition  tolerance.  Recognition  accuracy  is  a  performance  mea¬ 
surement  expressed  as  a  ratio  of  correctly  spoken  words /phrases  to  a 
base  value.  Recognition  tolerance  is  the  system’s  ability  to  correctly 
process  speech  under  less-than-optimal  conditions  (e.g.,  stress,  noise, 
g-forces,  etc.).  Additionally,  Lea  suggests  the  development  and  design 
of  performance  and  evaluation  tests.  Will  the  test  site  accurately  sim¬ 
ulate  expected  operating  conditions?  How  will  the  recognition  be 
evaluated?  What  scoring  methodology  will  be  used?  Finally,  how  will 
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voice  input  be  measured  and  compared  against  alternative  input  sys¬ 
tems  and  competing  recognizers? 

We  will  use  the  above  issues  as  a  foundation  for  our  evaluation 
throughout  the  evaluation  of  the  NOSC  prototype  system. 

D.  CONNECTED  SPEECH  GRAMMARS 

The  heart  of  any  natural  or  near-natural  language  interface  is  the 
syntax  of  the  system.  In  this  section,  we  will  formally  introduce  the 
concept  and  notation  of  a  syntax  and  consider  the  development  of 
connected  speech  grammars  for  two  distinct  classes  of  grammars: 
Natural  Language  (NL)  Grammars  and  Phrase  Grammars  (PG). 

The  most  powerful  class  of  connected  speech  systems  are  those 
that  accept  natural  language  constructs  as  input.  Natural  Language 
Processing  (NLP)  has  been  a  long-term  goal  of  speech  linguists.  In  the 
following  subsection  we  will  introduce  the  syntactic  constructs  neces¬ 
sary  to  support  NLP. 

A  less  powerful  command-type  application  is  the  use  of  connected 
speech  in  the  form  of  short  phrases.  NLP  is  at  one  extreme  of  the  con¬ 
nected  speech  continuum.  The  less  powerful  syntaxes  are  designed  to 
recognize  short,  command -type  phrases.  Structures  for  such  systems 
we  term  phrase  grammars  (PG).  Phrase  grammars  are  tailored  for 
each  application,  yet  they  are,  In  contrast  to  NL  systems,  much 
simpler  to  implement.  The  bulk  of  the  research  has  been  restricted 
to  the  NL  systems:  little  research  has  been  done  in  the  area  of  design 
considerations  for  systems  using  PG.  Following  the  NL  grammar 


section,  we  will  propose  specific  evaluation  criteria  which  may  be 
applied  to  any  PG-type  system. 


1.  Syntax  Terminology  and  Notation 

A  syntax,  in  simplistic  terms,  may  be  viewed  as  nothing  more 
than  a  road  map  through  a  grammar.  Borrowing  from  fundamental 
language  theory,  a  syntax  is  represented  by  a  set  consisting  of: 

1.  A  start  state 

2.  A  set  of  final  states  (implying  there  may  be  multiple  final  states) 

3.  A  set  of  intermediate  states 

4.  Transitions  between  states 

Figure  2.5  is  a  syntactic  diagram  for  commands  needed  to 
play  computer  chess  using  speech  input.  The  start  state  is  the  initial 
condition  (usually  silence).  When  an  utterance  is  detected,  an  attempt 
is  made  to  transition  from  the  silence  state  (or  node)  to  one  of  the 
follow-on  states.  The  syntax,  then,  is  the  combination  of  legal  utter¬ 
ances  that  lead  from  the  start  state  to  the  final  state.  For  example,  in 
Figure  2.5  a  legal  utterance  might  be  “MOVE  ROOK  TO  QUEEN  ROOK 
3”  or  “STATUS  CHECKMATE."  The  legality  of  the  phrase  is  not  guar¬ 
anteed:  it  is,  however,  a  syntactically  correct  utterance.  The  incorpo¬ 
ration  of  intelligence  in  the  syntax  is  a  topic  we  will  examine  shortly. 

2.  Syntactic  Analysis  of  Natural  Language  Grammars 

Parsing  the  human  language  according  to  its  grammatical 

constructs  was  the  first  technology  that  had  to  be  developed  before 
any  NLP  application  could  be  fielded.  Parsing  is  a  technique  by  which 
the  syntactic  structure  of  an  input  may  be  analyzed.  The  primary 
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classes  of  parsers  used  to  support  syntactic  analysis  developed  by  lin¬ 
guists  are:  context  free  parsers,  transformational  parsers,  and  aug¬ 
mented  context  free  parsers  [Ref.  9:p.  22], 
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Figure  2.5 

Sample  Connected  Speech  Syntax 

All  language  parsing  techniques  can  be  analyzed  in  terms  of 
Chomsky’s  language  hierarchy,  first  proposed  in  1957.  Figure  2.6  out¬ 
lines  the  overall  structure  of  the  Chomsky  hierarchy  for  representing 
grammars.  Initial  attempts  at  parsing  human  languages  resulted  in  the 
development  of  phrase  structured  grammars  which  were  identical  to 
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Chomsky  Regular  Grammars.  Linguists  quickly  discovered  phrase 
structured  grammars  were  a  convenient  means  for  representing  a  lan¬ 
guage  that  lacked  the  power  to  adequately  describe  a  human  language 
(English).  We  shall  see  later  that  although  Regular  Grammars  weren’t 
sufficiently  powerful,  linguists  were  able  to  modify  the  representation 
sufficiently  to  increase  their  power. 


Type 

Name 

Format 

Remarks 

0 

Unrestricted 

x  ->  y 

no  restrictions 
most  powerful 

1 

Context-sensitive 

x  ->  y 

where  lyl  £  Ixl 

2 

Context-free 

X  — » y 

y  is  a  terminal  or  a 
non -terminal  1 

3 

Regular 

X-»  Yx 

X  ->  x 

only  productions 
allowed 

Figure  2.6 

Chppigfry  ^glrfirghy 

Context  Sensitive  Grammars  (CSGs),  which  are  sufficiently 
powerful  to  represent  NLs,  were  difficult  to  work  with  and  were  not 
used  by  language  developers.  A  long-term  argument  developed  over 
whether  English  required  the  power  of  a  CSG,  but  this  appears  to  have 
become  a  moot,  theoretical  discussion  as  developers  have  demon¬ 
strated  reasonable  success  with  alternative  approaches  to  the  problem 
of  analyzing  languages  syntactically. 
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Context  Free  Grammars  (CFGs)  for  many  applications  were 
the  grammar  of  choice  for  developers  attempting  to  model  human 
language.  Figure  2.7  [Ref.  10:pp.  225-2321  is  a  much-simplified 
representation  of  an  English  language  sentence  structure  with  a  rep¬ 
resentative  syntactic  decomposition  of  a  simple  English  sentence. 
Artificial  Intelligence  languages  such  as  Prolog  are  an  effective  mech¬ 
anism  for  developing  and  analyzing  the  correctness  of  grammars 
developed.  The  conversion  of  a  language  from  a  CFG  to  Bacus-Nauer 
Form  (BNF)  and  then  to  a  Prolog  format  is  relatively  simple,  as  can  be 
seen  in  Figure  2.8  [Ref.  9:pp.  73-79). 


S  — >  NP  VP 

NP  — >  noun 
NP  — >  art  noun 
NP  — >  art  adj  noun 
NP  — >  pronoun 
NP  — >  pronoun  NP 

VP  — >  verb 
VP  — >  verbNP 
VP  — >  verbNPPP 
VP  — >  verbPP 

PP  — >  prepNP 


X\ 

NP  VP 

/\  K 

art  noun  verb  NP 

I  I  I  ^ 

A  dog  is  art  adj  noun 


a  furry  animal 


Figure  2.7 

Simple  CFG  Grammar 
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BNF 

CFG 

PROLOG 

<S>  ::=  <NP>  <VP> 

S  — > NP  VP 

s(X,Y):-  npDC.Z),vp(Z,Y) 

<NP>  ::=  noun  I  pronoun 
<NP>  ::=  art  noun 
<NP>  ::=  art  adj*  noun 
<NP>  ::=  pronoun  NP 

NP  -» noun 

NP  -» art  noun 

NP  -»  art  adj  noun 
NP  ->  pronoun 

NP  — »  pronoun  NP 

np(X,Y):-  noun(X,Y) 

np(X.Y):-  pronoun(X,Z),np(Z,Y) 

np(X,Y):-  pronoun(X,Y) 

np(X,Y):-  art  PC,  W)  .adjfW.Z)  .noun(Z.Y) 

<VP>  ::=  verb  1  verb  PPI 
verb  NP 

<VP>  ::=  verb  1  NP  1  PP 

VP  ->  verb 

VP  -» verb  PP 

VP  -»  verb  NP  PP 
VP  ->  verb  PP 

vp(X,Y):-  verb(X,Y) 
vppC,Y):-  verb(X,Z)  ,np(Z,Y) 
vp(X,Y):-  verb(X,W),np(W,Z),pp(Z,V) 
vp(X,Y):-  verbpC,Z)  ,pp(Z,Y) 

<PP>  prep  1  NP 

PP  -*  prep  NP 

pp(X,Y):-  prep(X,Z),np(Z,Y) 

Figure  2.8 

Alternative  Representations  of  a  Syntax 

The  next  evolutionary  step  in  grammar  representation  was 
transformational  grammar.  The  notion  of  a  transformational  grammar, 
first  proposed  by  Chomsky  in  1957,  grew  out  of  a  conviction  that  RGs 
and  CFGs  were  insufficient  to  fully  represent  English  (a  concern  that 
was  later  proved  unfounded).  A  transformational  grammar  is  based  on 
a  model  consisting  of  two  components:  a  base  component,  which  is  a 
CFG  that  generated  additional  or  “deep  structures";  and  a  transforma¬ 
tional  component,  which  is  a  set  of  rewrite  rules.  The  primary  prob¬ 
lem  with  transformation  grammars  is  that  of  combinatorial  explosion 
[Ref.  ll:pp.  151-162].  The  parser  must  consider  not  a  single  path  but 
rather  a  series  of  alternative  paths  which  must  be  evaluated.  Transfor¬ 
mational  grammars  enjoyed  only  limited  popularity  and  are  rarely 
found  in  today’s  applications. 
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The  long-term  winner  in  parsing  technology  appears  to  be 
the  approach  with  a  basis  In  simple  phrase  structured  grammars 
which  are  equivalent  to  RGs.  Since  regular  grammars  can  be  repre¬ 
sented  by  finite  state  transition  diagrams,  linguists  explored  the 
possibility  of  expanding  the  power  of  these  diagrams  while  retaining 
their  simplicity. 

The  result  is  known  as  the  transition  network  approach.  A 
transition  network  is  nothing  more  than  a  series  of  finite-state  dia¬ 
grams  which  are  used  to  simulate  the  power  of  a  CFG.  The  transition 
network  consists  of  two  components:  a  set  of  states  and  a  set  of  arcs. 
Recursive  transition  networks  (RTN)  describe  a  language  through 
recursion  by  developing  a  separate  network  for  each  non-terminal  in 
the  grammar.  Figure  2.9,  adapted  from  Allen  [Ref.  I2:pp.  41-46],  is  an 
RTN  based  on  a  simple  subset  of  English  grammar. 

Developers  were  generally  satisfied  with  the  simplicity  of  the 
RTN  but  wanted  to  represent  even  more  complex  constructs.  By 
adding  the  notion  of  registers  to  record  the  conditions  and  subsequent 
consequences  of  transiting  an  arc,  they  developed  augmented  recur¬ 
sive  transition  networks  (ATIM.  Kaplan  claims  that  ATNs  have  the 
generative  power  of  a  Turing  machine  [Ref.  13:p.  83]. 

The  apparent  power  and  relative  simplicity  of  ATNs  has  made 
them  the  overwhelming  choice  for  developers  for  commercial  NLU 
systems.  Why  is  this  so?  The  answer  is  directly  related  to  the  addi¬ 
tional  “status"  information  the  ATN  can  maintain.  Each  network  is 
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allowed  to  maintain  a  variety  of  registers  while  the  local  network  is 
active.  The  registers  maintain  the  status  of  specific  syntactic  condi¬ 
tions  as  they  relate  to  the  grammar  being  parsed.  With  this  added 
power,  the  parser  now  is  more  intelligent  and  can,  based  on  the 
grammatical  rules  and  status  of  the  registers,  correctly  parse  the  sen¬ 
tence.  The  best  approach  to  understanding  the  functionality  of  an 
ATN  parser  is  to  trace  through  a  sample  sentence.  Such  an  annotated 
sample  is  provided  in  Figure  2.10.  [Ref.  13:p.  83] 

While  ATNs  have  shown  to  be  the  most  promising  approach 
to  the  NLP  syntax  problem,  they,  like  any  other  RG,  can  only  be  pro¬ 
grammed  to  accept  valid,  grammatically  correct  sentences.  Poorly 
formed  yet  meaningful  sentences  cannot  be  supported.  This  inability 
to  accept  poorly  formed  or  ambiguous  input  streams  highlights  the 
limitation  of  syntactic  parsing  of  a  sentence.  Linguists  discovered  the 
parsing  could  only  determine  the  structure  of  what  was  said,  not  the 
meaning  of  the  input.  Another  process,  semantic  analysis,  is  needed  if 
NLP  systems  are  to  become  sophisticated  enough  to  support  the 
inherent  ambiguities  of  a  natural  language. 

3.  Designing  Phrase  Syntaxes 

Although  less  glamorous,  the  bulk  of  connected  speech  appli¬ 
cations  do  not  require  the  sophistication  of  NL  grammars.  Phrase-type 
grammars  offer  some  advantages  over  a  Natural  Language  system. 
First,  PGs  are  simpler  to  implement.  Second,  restricting  users  to  a 
small  number  of  acceptable  phrases  eases  the  learning  required. 
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TRANSITING 

ACTIVE 

INPUT 

ARC  # 

REG  SET 

REMAINING 

REMARKS 

1 

NONE 

The  sailors  drank  grog. 

Push  current  reg 
set.  Enter  NP. 

5 

DET  Reg  =  The 

sailors  drank  grog. 

7 

CAT  Reg  =  noun 

drank  grog. 

Set  person-num 
flag  =  plural. 

8 

grog. 

Return  to  S/. 

2 

CAT  Reg  =  verb 

grog. 

Set  tense  flag 
to  past. 

3 

grog. 

Push  current. 

6 

Jump 

7 

CAT  Reg  =  noun 
OBJ  Reg  =  grog 

Figure  2.10 

An  Augmented  Transition  Network 
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Finally,  inputs  to  a  well-constructed  PG  with  a  limited  vocabulary  will 
be  processed  faster  than  a  poorly  designed  NLP. 

There  are  two  ways  to  view  speech  system  performance.  The 
first  is  the  traditional  method  of  applying  a  scoring  algorithm  and 
assuming  all  errors  are  recognizer  induced.  The  second  approach  is 
to  assume  the  recognizer  is  capable  of  near-perfect  recognition,  view¬ 
ing  errors  as  syntax  rather  than  recognizer  failures.  In  analyzing  the 
performance  of  a  connected  speech  system,  we  must  consider  the 
isolated  word  and  phrase  scoring  and  resultant  confusion  matrix  as  a 
measure  of  the  system  performance,  providing  a  window  into  the 
functioning  of  the  syntax  itself. 

Despite  the  number  of  applications  and  the  growing  interest, 
the  literature  is  silent  on  design  considerations  for  developing  PGs.  In 
an  attempt  to  fill  the  void,  we  have  developed  10  rules  for  syntax 
development,  summarized  in  Figure  2.11.  These  rules  may  be  applied 
to  either  guide  the  design  effort  or  analyze  a  syntax  previously  devel¬ 
oped.  Our  objective  in  developing  the  rules  was  threefold.  First, 
improve  the  recognition  rate  by  avoiding  syntax-induced  errors. 
Second,  improve  processing  performance  by  reducing  the  number  of 
alternatives  a  recognizer  must  consider  at  each  node.  Third,  incorpo¬ 
rate  human  factors  into  the  syntactic  design. 

We  will  use,  as  an  example,  a  syntax  which  might  be  found  in 
a  typical  grocer’s  butcher  shop.  The  original  syntax  is  shown  in  Figure 
2.12.  After  introducing  each  rule,  we  will,  if  warranted,  provide  an 
example. 
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1.  Eliminate  non-determinism. 

2.  Avoid  phonemic  rhymes  within  a  node. 

3.  Minimize  nodal  branching  factor. 

4.  Discourage  indiscriminate  self-looping. 

5.  Provide  escape  from  any  node. 

6.  Eliminate  silent  jumps  to  finish. 

7.  Eliminate  nonsensical  transitions. 

8.  Limit  phrase  length  (7  ±  2). 

9.  Avoid  redundancy. 

10.  Limit  syntax  to  specific  (singular)  task. 


Figure  2.11 

Syntax  Design  Rules 


sale 

special 

reduced 


"silence' 


pork  ™  T-bone  _ 
chicken  vo  pot  roast  vo 

aged 

half 

fresh 

frozen 

chltlens  Young-Tom 

whole 

smoked 

beef  cutlets 

split 

baked 

turkey  breast 

bone-ln 

veal  legs 

met 

seafood  thighs 
loti 

hamburger 

butt 

lobster 

flounder 

tuna 

swordfish 

boneless 

Note: 

1.  Sale  types  allowed  are  "sale,"  'reduced.'  ’special.'  or  none  (silent  Jump) 

2.  Only  ’reasonable"  products  are  produced  (l.e.,  'turkey  hamburger  fillet  baked" 
Is  not  reasonable. 


Figure  2.12 


a.  Eliminate  Non-determinism 

Non-determinism  occurs  whenever  an  utterance  appears 
on  more  than  one  path  from  a  single  node.  Syntactic  ambiguity  is  the 
result  of  non-determinism.  A  non-deterministic  syntax  cannot,  by 
definition,  be  expected  to  perform  correctly.  Figure  2.13  is  a  partial 
syntax  containing  a  non-deterministic  ambiguity.  Did  the  speaker 
intend  orange  to  be  a  color  or  a  fruit?  Without  knowing  the  context  of 
the  preceding  and  following  utterances,  it  is  impossible  to  interpret 
the  intended  meaning. 


b.  Avoid  Phonemic  Rhymes  Within  a  Node 

Phonemic  rhymes  are  a  leading  cause  of  substitution-type 
errors.  Although  sometimes  unavoidable,  words  with  similar  pho¬ 
nemes  should  not  be  found  in  the  same  node.  In  our  example, 
branching  from  the  start  state  "CHICKEN"  could  easily  be  confused 
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with  “CHITLENS"  (depending  upon  speaker  pronunciation).  Elimi¬ 
nating  the  ambiguity  can  be  achieved  by  finding  a  substitute  word 
(POULTRY)  or  by  reorganizing  the  syntax. 

c.  Minimize  Nodal  Branching  Factor 

As  a  general  rule,  the  more  word  choices  on  a  single 
branch,  the  greater  the  possibility  for  substitution  errors.  The  smaller 
the  branching  factor,  the  better  the  performance.  From  node  A  in  Fig¬ 
ure  2.12,  for  example,  we  can  transition  on  any  of  39  utterances. 

d.  Discourage  Indeterminate  Self-Looping 

A  self-loop  is  used  when  multiple  occurrences  of  the 
same  set  of  utterances  is  desired.  For  example,  the  self-looping  syntax 
in  Figure  2.14  allows  a  single  node  to  generate  a  string  of  digits  with 
imbedded  characters,  ending  with  a  character.  Because  the  self-loop 
is  indeterminate,  the  only  known  fact  is  that  the  string  will  consist  of 
at  least  one  digit  and  one  character.  Indiscriminate  self-looping 
increases  the  branching  factor,  increasing  the  probability  of  an  error. 
One  approach  to  eliminating  self-loops  is  to  build  separate  nodes  for 
the  exact  number  of  occurrences  desired.  Suppose,  for  example,  an 
application  required  a  phrase  consisting  of  the  last  four  digits  of  a 
social  security  number  followed  by  the  person’s  initials.  While  both 
syntaxes  satisfy  the  specification,  the  bottom  syntax  in  Figure  2.14 
could  be  expected  to  have  a  higher  probability  of  successful  recogni¬ 
tion.  The  exception  to  this  is  when  self-loops  are  used  as  an  error- 
correction  technique  or  on  a  start  node. 
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Figure  2.14 

Self-Loop  Removal 

e.  Provide  Escape  Mechanisms  From  Any  Node 

Frustration  mounts  when  an  utterance  is  misspoken 
(user  error)  or  misrecognized  (recognizer  error),  yet  he/she  is 
trapped  by  the  syntax  until  they  can  “talk  out”  to  the  final  state.  Two 
correction  techniques  are  to  allow  the  user  to  either  correct  a  single 
node  immediately  or  to  bail  out  and  start  over.  Both  techniques  should 
be  triggered  by  a  single  word  (e.g.,  “CORRECTION”  or  “QUIT").  Fig¬ 
ure  2.15  includes  these  escape  mechanisms. 

f.  Eliminate  HSilentn  Jumps  to  the  Pinal  State 

In  a  noisy  environment,  any  noise  may  be  potentially 
included  as  part  of  the  input  to  the  recognizer.  Attempting  to  transi¬ 
tion  on  “silence"  to  a  state,  particularly  the  final  state,  may  result  in 
substitution  errors  due  to  noise.  Eliminate  this  problem  by  avoiding 
nodes  which  allow  the  user  to  follow  a  path  through  the  syntax  and 
then  opt  to  transition  on  silence  to  the  final  state.  If  partial  phrases 
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are  acceptable,  then  they  should  be  terminated  with  a  unique  utter¬ 
ance  (e.g.,  “SEND,”  “OK,"  “STOP,"  etc.).  In  our  example,  the  transi¬ 
tion  to  the  final  state  from  the  last  node  should  be  accomplished 
either  with  a  adjective  in  the  following  node  or  some  reserved  termi¬ 
nator  word. 


Figure  2.15 

gpErsctcfl-Syatftg 
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g.  Eliminate  Nonsensical  Transitions 

In  the  course  of  developing  the  syntax,  the  designer  is 
apt  to  improve  flexibility  by  adding  words  in  each  node.  The  result  is 
that  certain  combinations  which  do  not  occur  in  the  “language"  are 
still  available  in  the  syntax.  These  unused  elements  of  a  node  will 
prove  bothersome  in  the  form  of  substitution  errors.  An  example  that 
comes  to  mind  is  in  the  use  of  digits.  If  there  is  a  naturally  occurring 
limit  to  a  value  in  the  application,  then  the  digits  node  should  be 
restricted  to  recognize  only  those  digits  which  are  possible.  In  our 
example,  repeating  the  original  digits  node  to  represent  ounces  is 
nonsensical  because  a  practical  limitation  to  the  ounces  would  be  15 
(16  ounces  being  a  pound). 

h.  Limit  the  Phrase  Length  to  7  ±  2  Words 

Seven,  plus  or  minus  two,  is  a  set  of  values  frequently 
associated  wiui  human  information  processing  capacity  (Ref.  14:p.  52]. 
In  this  case,  we  propose  it  as  a  reasonable  limit  to  phrase  length.  Two 
distinct  problems  are  likely  to  occur  with  longer  phrases.  First,  there 
is  an  increased  probability  of  an  error  within  the  phrase.  Remember 
that  the  probability  of  a  recognition  is  an  independent  event;  the  total 
probability  of  speaking  a  phrase  correctly  is  obtained  by  multiplying 
the  probabilities  of  a  correct  recognition  for  each  word  by  each  other. 
Second,  there  can  be  an  increase  in  operator-induced  errors  due  to 
either  incorrect  syntax  (phrase  not  allowed)  or  misspeaking.  Lengthy 
phrases  are  unnatural  and  would  logically  be  harder  to  learn;  by 
enforcing  a  strict  limitation  on  the  phrase  size  we  reduce  the 
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probability  the  operator  would  become  “tongue  tied.”  One  exception 
that  frequently  occurs  is  in  the  case  of  well-connected  digits  (e.g.,  a 
telephone  number,  social  security  number  (SSN),  etc.).  Depending 
upon  the  content  of  the  phrase,  an  entire  string  of  digits  might  be 
considered  a  single  word. 

L  Avoid  Redundancy 

In  order  to  increase  system  performance  and  to  optimize 
the  man-machine  interface,  voice  input  should  not  be  used  when 
alternative  sources  of  information  are  available.  For  example,  in  our 
butcher  shop  there  is  little  need  for  repeating  the  weight  information 
displayed  from  an  electronic  scale.  Ideally,  the  butcher  would  opti¬ 
mize  the  application  by  only  using  speech  to  identify  the  product  and 
its  attributes  and  use  a  scale  to  provide  the  weight  information.  Con¬ 
trol  would  be  obtained  by  capturing  the  weight  at  the  command  “GO." 
Generally,  a  phrase  should  only  include  information  that  is  not  avail¬ 
able  from  other  sources. 

j.  Limit  Syntax  to  a  Specific  Task 

Essentially,  this  rule  suggests  that  the  designer  scope  the 
syntax  to  a  specific  task.  If  there  are  related  tasks  with  similar 
phrases,  then  we  suggest  a  task-specific  syntax  be  developed  for  each 
task.  The  fundamental  concern  is,  again,  with  pruning  the  syntax  so 
that  only  the  essential,  minimum  set  of  transitions  remains  valid 
within  the  syntax.  For  example,  in  our  butcher  shop,  suppose  there 
were  two  separate  lines  maintained  because  of  local  sanitary  restric¬ 
tions— one  for  seafood  only,  the  other  for  all  other  products.  The 
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syntaxes  for  both  applications,  while  similar,  would  have  different 
vocabularies.  The  seafood  line  would  not  include  references  to  beef  or 
poultry  products,  with  similar  restrictions  as  appropriate  for  the  “all 
others"  line.  It  is  important  to  note  that  while  the  vocabularies  differ, 
the  syntaxes  should  be  as  similar  as  possible  since  the  same  Individual 
performs  both  tasks. 

Figure  2.15  amplifies  and  supports  the  preceding  discussion 
by  redesigning  the  original  syntax  according  to  the  ten  design  guide¬ 
lines  presented.  Can  we  predict  how  dramatic  the  change  would  be? 
Suppose  we  assume  that  as  the  nodal  branching  factor  increases  by  3, 
the  probability  of  a  correct  recognition  decreases  by  1  percent.  We  can 
then  estimate  the  correct  recognition  probabilities  for  each  node. 
Assuming  we  view  the  transition  from  each  node  as  a  discrete  and 
independent  event,  we  could  estimate  a  comparative  recognition 
probability  for  each  syntactic  phrase.  These  probabilities  are  found  in 
the  respective  figures.  The  original  syntax  has  a  probability  of 
approximately  .84,  while  the  redesigned  syntax  achieves  an  expected 
recognition  rate  of  .94.  The  overriding  concern  in  connected  speech 
systems,  then,  is  to  strive  for  improved  recognition  rates  through 
careful  syntactic  design. 
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m.  CATCC  OPERATIONS 


This  chapter  is  designed  to  introduce  the  reader  unfamiliar  with 
the  Carrier  Air  Traffic  Control  Center  (CATCC).  Specific  topics  include 
mission  description,  organization,  fundamental  information  flow,  and 
the  operating  environment.  The  scope  of  this  discussion  will  be  lim¬ 
ited  to  that  background  needed  to  understand  the  overall  nature  of  the 
application.  It  is  not  intended  to  serve  as  a  requirements  definition  or 
a  functional  description.  Readers  familiar  with  CATCC  operations  may 
omit  this  chapter  without  loss  of  continuity. 

A.  ORGANIZATION  AND  MISSION 

The  two  primary  organizations  within  the  CATCC  are:  Air  Opera¬ 
tions  (AirOps)  and  Carrier  Controlled  Approach  (CCA).  We  will  briefly 
examine  both  of  these  organizations. 

The  principal  function  of  AirOps  is  to  coordinate  all  flight  opera¬ 
tions  for  all  airborne  aircraft.  Major  tasks  include: 

1.  Prepare  the  air  plan. 

2.  Brief  ready  rooms. 

3.  Coordinate  with  divert  airfields. 

4.  Monitor  launch  and  recovery  operations. 

5.  Maintain/display  aircraft  status  and  mission  information,  as 
required. 

6.  Coordinate  diversion  of  airborne  aircraft. 
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Figure  3.1  is  a  typical  layout  of  AirOps.  Pages  117  through  120  of 
Appendix  A  are  sample  layouts  of  the  status  boards  and  a  description  of 
the  acronyms  associated  with  each  of  the  boards.  AirOps  is  headed  by 
an  Air  Operations  Officer  and  manned  with  approximately  eight  sailors. 
Information  needed  to  update  status  boards,  internal  and  external  to 
AirOps.  is  accomplished  via  operators  using  sound-powered  communi¬ 
cation  systems.  This  information  is  duplicated  for  at  least  50  indi¬ 
viduals  throughout  the  ship  [Ref.  15:p.  1].  Frequent  human-error  and 
untimely  transmission  of  the  information  throughout  the  ship 
adversely  affects  accomplishment  of  the  AirOps  mission. 

The  CCA’s  function  is  to  provide  for  the  safe  and  effective  control 
of  airborne  aircraft.  The  CCA  is  specifically  tasked  with  controlling  all 
aircraft  within  a  50-mile  radius  of  the  carrier  and  for  the  recovery  (i.e., 
safe  landing  on  the  carrier)  of  all  aircraft  operating  under  night  and  or 
Instrument  Flight  Rules  (IFR)  conditions.  Major  tasks  include: 

1.  Control  aircraft  departures,  marshal,  approach  and  final 
approach. 

2.  Display  and  disseminate  aircraft  status  information,  as  required. 

3.  Monitor  launch  and  recovery  operations. 

CCA  manning  includes  a  CCA  officer,  assisted  by  a  CCA  supervisor. 
Additionally  there  are  Marshal,  Approach,  Departure,  and  Final  con¬ 
trollers.  Approximately  10  individuals  are  needed  to  man  the  CCA. 
Information  needed  by  other  organizations  is  distributed  via  the  same 
sound-powered  phone  system.  Specific  problems  typically  include 
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Figure  3.1 


transferring  fuel  state  and  approach  information  to  AirOps  in  a  timely 
and  accurate  manner.  Figure  3.2  is  a  layout  of  a  typical  CCA.  Pages 
121  through  127  of  Appendix  A  are  the  status  boards  maintained  by 
CCA  personnel. 

Marshal  controller  duties  include  tasks  that  ensure  the  orderly 
control  and  separation  of  aircraft  awaiting  approach  to  the  carrier. 
The  Marshal  controller  must  issue  to  the  approaching  aircraft  the  fol¬ 
lowing  information: 

1.  Recovery  type. 

2.  Marshal  radial,  distance,  and  altitude. 

3.  Expected  Approach  Time  (ETA). 

4.  Time  check. 

5.  Weather  information. 

6.  Expected  final  bearing. 

7.  Approach  frequency  (button). 

This  information  is  also  displayed  in  the  CCA  and  communicated 
to  other  locations. 

Departure  and  Approach  controllers  are  responsible  for  the  safe 
control  of  aircraft  departing  or  approaching  the  ship.  Information 
associated  with  these  events  includes  departure  or  first-approach 
times,  radio  frequencies,  aircraft  status,  and  fuel  state. 
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HVM  III' 


1  -  general  information 

2  -  GENERAL  INFORMATION 

3  -  MARSHALL  STATUS  BOARD 

4  -  MARSHALL  STATUS  30AR0 

5  -  AP°ROACH  STATUS  BOARD  * 

6  -  APPROACH  status  board 

7  -  Glow  BOARD 

8  -  divert  board 


Figure  3.2 

Carrier  Controlled  Armroarh 
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B.  ENVIRONMENT 

The  at-sea  operating  environment  is  hostile  to  sensitive  com¬ 
puter-based  systems.  In  the  following  paragraphs  we  will  examine 
some  potential  problems  that  can  be  anticipated  in  operating  any  sys¬ 
tem  in  the  CATCC. 

1.  Power 

Periodic  fluctuations  and  losses  of  power  are  not  an  uncom¬ 
mon  occurrence  aboard  any  naval  vessel.  Commercial  computer 
equipment,  sensitive  to  power  fluctuations,  must  have  hardware  and 
software  protection  systems  to  support  unexpected  losses  or  changes 
in  current. 

2.  Vibration 

A  carrier  during  flight  operations  is  subject  to  two  kinds  of 
periodic  vibrations:  (1)  vibrations  associated  with  being  underway,  and 
(2)  vibrations  caused  by  the  launch  and  recovery  of  high-performance 
jet  aircraft.  Vibrations  transmitted  through  deck  plates  and  bulkheads 
affect  all  shipboard  systems.  Sensitive  systems  must  be  protected 
from  vibration  by  a  combination  of  ruggedization  and  shock  mounting. 

3.  Illumination 

The  CATCC  operates  in  a  reduced  lighting  mode  to  enhance 
the  contrast  of  radar  displays.  Operators  must  be  able  to  operate  their 
systems  without  the  need  for  additional  lighting. 

4.  Space 

Space  aboard  a  combatant  vessel  is  at  a  premium.  The  CATCC 
is  no  exception.  Discretionary  space  in  the  CATCC  is  at  an  absolute 
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minimum;  there  is  barely  sufficient  area  for  the  personnel  and  systems 
installed. 

5.  Noise 

Noise  sources  in  an  operating  CATCC  include:  Electronic 
“white”  noise,  noises  associated  with  flight  operations,  radio  trans¬ 
mission  and  other  speaker  noises,  and  human  conversation. 

6.  Electro-Magnetic  Interference  (EMU 

The  large  number  of  electronic  systems  operating  in  close 
proximity  are  subject  to  spurious  and  unwanted  EMI.  The  results  of 
EMI,  if  not  anticipated,  are  the  unusual  and  seemingly  inexplicable 
losses  of  data  or  changes  in  system  operating  characteristics. 

7.  Ventilation 

Poor  ventilation  systems  hinder  the  removal  of  heat  gener¬ 
ated  by  electronic  components.  Additionally,  the  lack  of  circulation 
hampers  removal  of  smoke  and  dust  particles  which  adversely  affect 
sensitive  devices  such  as  magnetic  tapes,  magnetic  diskettes,  and 
computer  read /write  heads  associated  with  mass  storage  devices. 

The  seven  environmental  factors  alone  are  not  sufficient 
when  considering  installation  of  a  system  at  sea.  Systems  installed 
must,  for  instance,  be  able  to  withstand  reasonable  operator  abuses 
(liquid  spills,  rough  handling,  etc.).  Systems  must  also  be  capable  of 
being  maintained  and  operated  by  carrier  personnel.  Low-level  main¬ 
tenance  of  hardware  and  software  should  be  able  to  be  accomplished 
by  embarked  sailors  as  it  is  required.  More  extensive  maintenance 
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requirements  may  require  on-site  contractor  support  at  naval  bases 
and  repair  facilities. 
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IV.  THE  NOSC  PILOT  SYSTEM 


A.  GENERAL 

The  pilot  system  designed  and  developed  by  NOSC  (code  441) 
was  intended  primarily  as  a  vehicle  for  validating  the  concept  of  auto¬ 
mating  updates  to  the  CATCC  status  boards.  The  pilot  system,  as 
delivered,  was  not  developed  as  the  final  solution  to  the  application.  It 
is.  however,  a  first  attempt  at  evaluating  alternative  architectures, 
application  software,  and  voice  recognition  systems.  From  the  pilot 
system,  valuable  insight  useful  for  future  prototype  development,  if 
warranted,  can  be  obtained.  It  must  be  stressed  that  the  system 
evaluated  at  NPS  has  not  been  installed  in  an  operational  at-sea  test 
environment. 

The  hardware  and  software  provided  during  the  test  may  never  be 
actually  implemented  in  the  final  system.  As  key  components  in  the 
pilot  system,  they  are,  nonetheless,  valuable  for  establishing  a  baseline 
of  experience  upon  which  the  application  can  be  developed.  In  par¬ 
ticular,  we  recognize  that  the  installed  voice  recognition  system  and 
supporting  software  is  a  pre-production  version  made  available  to 
select  research  organizations.  It  is  with  that  understanding  that  we 
examine  the  system,  as  installed,  in  the  following  sections.  Except 
where  noted,  the  system  was  intentionally  evaluated  “as  delivered." 
Deviations  were  limited  to  those  that  would  directly  support  the 
research  effort. 
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The  chapter  will  start  with  an  overview  of  the  hardware  compo¬ 
nents,  followed  by  a  more  detailed  review  of  the  recognizer  used.  We 
will  then  examine  the  software  components  and  syntax  design.  This 
chapter  will  close  with  an  orientation  to  the  operational  procedures 
involved  in  training  and  operating  the  system. 

B.  HARDWARE  DESCRIPTION 
1.  Overview 

The  system  is  based  upon  a  Sun  Microsystems  model  3/160 
multi-user  mini-computer.  Configured  around  the  successful  VME 
architecture  and  supported  by  a  Motorola  32-bit  M68020  micropro¬ 
cessor,  the  Sun  system  is  designed  for,  and  capable  of  supporting, 
multiple  users  in  a  wide  variety  of  applications.  As  delivered,  the  Sun 
system  has  a  mass-storage  device  capable  of  storing  142  MB 
(megabytes)  of  information.  In  addition,  the  installed  system  was 
equipped  with  8  MB  of  Random  Access  Memory  (RAM).  A  mass-stor¬ 
age  tape  back-up  was  available  for  archiving  files. 

Connected  to  the  system  via  RS-232  connectors  were  six 
WYSE  model  60  ASCII  terminals,  three  of  which  were  equipped  with 
standard  keyboards  for  input.  These  terminals  have  a  14-inch  amber- 
on-black  display  screen.  The  purpose  of  the  each  display  will  be  cov¬ 
ered  when  we  discuss  the  status  boards. 

In  addition  to  the  six  ASCII  terminals,  a  Sun  workstation  was 
included.  The  Sun  workstation  is  a  black-on-white  19-inch  display 
capable  of  supporting  high-resolution  graphics  and  multiple  windows. 
This  workstation  is  the  primary  terminal  and  was  used  in  this 


49 


application  for  training,  testing,  and  operation  of  the  status  boards. 
Associated  with  the  terminal,  was  a  light-driven  mouse  pointing  device 
supported  by  the  SUNTOOLS  software.  Menu  selection  and  window 
control  functions  were  the  primary  mouse-driven  events. 

Printed  output  was  produced  by  a  Texas  Instruments  dot¬ 
matrix  printer  connected  to  one  of  the  Sun’s  printer  ports. 

NOSC  provided  Shure  model  10  headsets  with  a  Hewlett- 
Packard  model  465A  amplifier  as  recognizer  voice  input  devices. 
These  headsets,  while  suitable  for  low-noise  conditions,  proved  unus¬ 
able  above  65  dBA  of  noise.  A  substitute  Plantronics  SNC  1436  noise¬ 
cancelling  microphone  was  provided  by  ITTs  Defense  Communication 
Division  (DCD)  for  the  duration  of  the  NPS  evaluation.  The  use  of  the 
Plantronics  headset  eliminated  the  need  for  additional  amplification  of 
the  input  signal,  allowing  removal  of  the  HP  amplifier. 

Figure  4.1  is  an  overall  diagram  of  the  hardware  architecture 
we  evaluated.  We  stress  that  this  is  only  the  initial  configuration.  The 
system  may  be  expanded  to  include  additional  Sun  computers,  work¬ 
stations,  and  display  terminals  supported  via  an  Ethernet  network. 
Figure  4.2,  provided  by  NOSC,  is  a  system  architecture  to  which  the 
system  may  ultimately  evolve. 

2.  The  Voice  Recognition  System 

An  ITT  VRS  1280/VME  was  the  voice  recognizer  included  in 
the  system.  The  VRS  1280  architecture  includes  its  own  M68000 
processor  and  thus  is  not  reliant  on  any  external  processor  to  support 
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Figure  4.1 

Pilot  System  Architecture.  Tested 
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Electronic  Status  Board  System  Design 
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recognizer  operations.  The  overall  architecture  is  diagrammed  in  Fig¬ 
ure  4.3  (Ref.  16:p.  111.  A  summary  of  system  features  is  found  in  Table 
4.1  [Ref.  16:p.  14].  Template  matching  calculations  are  performed  in 
the  Dynamic  Time  Warping  circuitry.  While  the  exact  technologies 
used  by  the  recognizer  are  proprietary,  the  VRS  1280  Product 
Description  does  provide  the  following  insight: 


Audio . 
In 


Synthesis 
Audio  Out  «- 
Recorded  — 
Audio  In 


Codec 


H  TMS-32020  fr - j  68COO  Mzo] 


CVSD  fr 


DTW 

Circuit 


t-»  VME-Bus 
*-*  RS232 


Figure  4.3 

ITT  PCP  VPS  I28Q/VME  Arghltggt.Wg 


ITTDCD’s  approach  to  speech  and  speaker  recognition  is  based  on  a 
powerful  kernel  technology  for  the  basic  pattern  matching  algo¬ 
rithm.  This  kernel  technology  is  referred  to  as  the  Template 
Determined  Endpoint  Detection  fTDEP)  algorithm.... the  ITT  DCD 
algorithm  does  not  employ  any  technique  to  explicitly  detect  where 
words  begin  and  end  prior  to  any  pattern  matching  computations, 
thus  eliminating  a  major  source  of  recognition  errors.  [Ref.  16:p.  1] 

A  continuous  matching  algorithm  compares  the  incoming 
signal  against  known  vocabulary  words,  background  noise  templates, 
and  phoneme  templates,  allowing  for  the  identification  of  both  speech 
and  non-speech  signals  [Ref.  16:p.  2J.  The  syntax  can  be  adjusted  to 
support  a  variety  of  speech  styles  ranging  from  phrases  without  pauses 
to  phrases  with  imbedded  pauses  of  user-determinable  length  and 
location. 


52 


TABLE  4.1 

MODEL  VRS  1280/VME  DETAILED  SPECIFICATIONS 


Recognition  Vocabulary  Capacity; 

Throughput  Capacity: 
Mode: 

Response  Time; 

Training 

Synthesis  Algorithm: 

Rate: 

Capacity: 

Record/ Playback 


Analog 

I/O 


In: 

Out 


•  500  unique  words 

•  1280  sec.  of  speech  (RAM) 
(approximately  2000  words) 

•  >500  seconds  of  speech 
(approximately  800  words) 

•  Speaker  dependent 

•  Continuous  or  isolated  words 

•  Syntaxed  as  required 

•  .25  second  (avg.) 

•  One  or  more  repetitions  of  each 
vocabulary  word  for  initial 
training 

•  Easily  updated  If  necessary  to 
accommodate  changes  In  the 
speaker’s  voice 

•  CVSD 

•  16  Kbps 

•  64  seconds  of  speech  capacity  in  on¬ 
board  RAM  (additional  vocabulary 
can  be  stored  off-board) 

•  Simultaneous  with  recognition 

•  Record/playback  function  sup¬ 
ported  with  CVSD  analysis/ 
synthesis;  two  2-second  buffers 
provided  (also  used  for  Inputting 
messages  to  be  synthesized 

•  Simultaneous  with  recognition 

•  Line  Input  (Odbm,  6000) 

•  Line  output  (Odbm.  6000) 

•  VME  bus  RS232 


Physical  Size: 


•  Double-sized  extended  (233.3mm  x 
220mm)  VME  board  form  factor 
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C.  SOFTWARE  DESCRIPTION 

Software  for  the  system  consists  of  both  systems  and  applications 
software.  The  SUN  computers  operate  under  the  UNIX  operating  sys¬ 
tem.  In  addition,  the  VRS  1280  is  supported  by  ITT- supplied  User 
Interface  Software  (UIS),  which  is  a  menu-driven  system  for  interact¬ 
ing  with  the  recognizer.  Application  programs,  developed  in  the  “C" 
programming  language  by  NOSC,  parse  outputs  from  the  recognizer 
and  control  the  status  board  displays.  In  addition,  a  series  of  routines 
were  developed  to  automate  the  menu  selection  process  for  training, 
testing,  and  operation  of  the  ITT  UIS. 

Detailed  discussion  of  the  UNIX  operating  system  is  not  required 
within  the  scope  of  this  research.  Specific  NOSC  applications,  pro¬ 
grams.  and  user  routines  will  be  discussed  in  detail  in  later  sections. 
Important  to  the  research,  however,  is  an  understanding  of  the  func¬ 
tions  available  via  the  ITT  UIS.  Many  of  these  functions  were  hidden 
from  the  user  by  NOSC  routines  developed  to  improve  and  simplify  the 
user  interface.  Nonetheless,  a  rudimentary  understanding  of  the  ITT- 
supplied  interface  is  considered  necessary  to  understanding  the  func¬ 
tionality  of  the  recognizer. 

The  UIS  consists  of  user-selectable  two-character  commands 
presented  in  a  series  of  menus.  We  will  limit  our  discussion  to  the 
most  important  commands  found  in  the  main  menu  (Figure  4.4). 
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MAIN  MENU 


ACTION 


ep 

es 

sh 

dp 

ts 

cs 

cn 

at 

ct 

Id 

Gt 

SID 

r ! 
a ! 


communications  witir  rec 


edit  engineering  parameter  file 
edit  or  create  syntax  file 
select  script  file 
select  data  pump  file 
create  training  script  file 
create  silence  template 
calibrate  noise  estimate 
adjust  templates 
copy  template 

upload  or  download  data  files 
onroll  or  train  speech  templates 
select  recognition  control  mode 


clear  recognizer  memory 
exit  host 

SBBSISEIEBSSSB 


and  reset 


JCMD> 


x'-ir  it ,-A'. 


Figure  4.4 
Main  Menu 
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ep:  The  engineering  parameter  file  (Table  4.2)  contains  a  large 
number  of  system  features  which  allow  the  system  to  be  tailored 
to  the  application.  This  includes  the  adjustment  of  rejection 
threshold  settings,  pause  lengths,  and  gain  controls.  While 
many  of  the  parameters  are  at  “factory"  setting,  tuning  the 
board  to  optimize  performance  for  a  specific  application  may  be 
required.  Entering  the  “ep"  command  allows  the  user  to  view 
the  file  and  adjust  the  current  parameter  settings,  as  needed. 

cn:  An  ability  to  operate  in  a  variety  of  noise  conditions  is  a  prereq¬ 
uisite  for  most  voice  applications.  The  ITT  system  allows  for 
the  calibration  of  the  ambient  noise  by  executing  the  “cn" 
command  from  the  main  menu.  Calibration  of  the  noise 
requires  approximately  15  seconds. 

at:  According  to  the  user’s  manual,  templates  should  be  created  in 
quiet  conditions.  In  order  to  adjust  the  templates  for  the  cali¬ 
brated  noise,  the  “at"  command  is  issued. 

dt:  Before  a  recognition  session  can  commence,  both  the  syntax 
and  the  user  vocabulary  templates  must  be  successfully  down¬ 
loaded  to  the  recognizer.  Downloading  templates  (dt)  may  be 
aborted  If  the  path  is  incorrect,  if  templates  are  corrupted  or 
missing,  or  if  there  is  a  recognizer  synchronization  problem. 

es:  A  syntax  file  may  be  either  created  or  edited  by  issuing  the  “es" 
command.  ITT  software  allows  the  creation  of  a  node-based 
syntax,  each  node  consisting  of  words  which  may  be  reached 
within  the  node.  The  editor  allows  the  addition,  deletion,  and 
connection  of  nodes,  as  required,  to  create  the  desired  syntax. 
Recognizer  limitations  include  a  maximum  of  60  words  per 
node  in  a  total  of  255  nodes.  The  maximum  number  of  words  is 
400  [Ref.  17:p.  5]. 


D.  SYNTAX  DESIGN 

Syntax  design  was  based  on  the  vocabulary  necessary  to  operate 
the  CATCC  displays.  A  copy  of  the  combined  syntax  supplied  with  the 
system  is  found  in  Appendix  B,  page  1.  Total  size  of  the  working 
vocabulary  is  71  words  organized  into  30  nodes.  All  three  displays 
(Marshal,  Approach,  and  Departure)  can  be  supported  by  the  syntax. 
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TABLE  4.2 


ENGINEERING  PARAMETER  FILE 


20000  path  score  rescaling  threshold 
2000  offset  to  calculate  pruning  threshold 
6  max  number  of  total  options  saved 
1000  option  pruning  threshold  offset 

0  node  number  of  starting  node 

1  end  node  number 

1  weight  assigned  to  downloaded  templates 

3  max  weight  allowed  for  templates 

2  minimum  number  of  training  passes 

3  max  #  times  template  length  is  adjusted 
200  max  delay  allowed  for  results  output 

-1  penalty  imposed  for  special  loop  back  syntax  node 
-6  scale  factor  for  relative  gain  term 
0  window  size  for  relative  gain 

6  programmable  gain  control  in  TMS320 

2  scale  factor  for  mel  cepstral  coef  1 

2  scale  factor  for  mel  cepstral  coef  2 

2  scale  factor  for  mel  cepstral  coef  3 

2  scale  factor  for  mel  cepstral  coef  4 

2  scale  factor  for  mel  cepstral  coef  5 

2  scale  factor  for  mel  cepstral  coef  6 

2  scale  factor  for  mel  cepstral  coef  7 

2  scale  factor  for  mel  cepstral  coef  8 

32  offset  for  mel  cepstral  coef  1 

32  offset  for  mel  cepstral  coef  2 

32  offset  for  mel  cepstral  coef  3 

32  offset  for  mel  cepstral  coef  4 

32  offset  for  mel  cepstral  coef  5 

32  offset  for  mel  cepstral  coef  6 

32  offset  for  mel  cepstral  coef  7 

32  offset  for  mel  cepstral  coef  8 

1  log  likelihood  rejection  enable  flag 

15  log  likelihood  rejection  threshold 

1  log  likelyhood  rejection  filler  training  enable  flag 
0  noise  tracker  enable  flag 

0  noise  tracker  rejection  enable  flag 

3  max  #  times  a  template  can  be  updated  per  training  session 

0  DTW  diagno_stic  loop  forever  enable  flag 

0  template  warping  function: 

10  max  length  of  pause  nodes  in  special  syntaxes 
1  weight  assigned  to  enrolled  templates 

0  data  pump  enable  flag 

-1  hardware  push-to-talk  flag 

0  AGC  enable  flag 

40  delay  value  before  first  gain  increase 

15  delay  before  each  subsequent  gain  increase 

4  Ebar  noise  tracker  time  constant  (shift  value) 
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Operating  a  display,  however,  requires  only  a  subset  of  the  combined 
syntax.  Although  not  implemented,  alternative  syntaxes  were  consid¬ 
ered  by  NOSC  and  are  also  found  in  Appendix  B.  These  smaller  syn¬ 
taxes  are  designed  to  support  exactly  the  specified  function,  thus 
eliminating  syntactic  overlap. 


E.  APPLICATION  SOFTWARE  OPERATION 
1.  Training  the  Recognizer 

The  ITT  VRS  1280  is  a  speaker-dependent,  connected 
speech  system.  Each  speaker  must  initially  train  the  vocabulary  for  his 
or  her  particular  voice.  This  was  accomplished  by  executing  a  NOSC- 
developed  routine  called  “host."  The  initial  screen,  Figure  4.5, 
prompts  the  user  for  personal  information.  Figure  4.6  is  the  initial 
training  menu  displayed  on  the  Sun  workstation. 

The  first  option  allowed  for  enrolling  and  training  of  the  dig¬ 
its  0  through  9.  When  executed,  a  series  of  ITT  interface  menus 
would  be  automatically  executed  (downloading  templates,  calibrating 
noise,  etc.).  After  approximately  30  seconds,  the  user  would  be  pre¬ 
sented  with  the  initial  digits  training  screen  found  in  Figure  4.7. 

This  screen  is  composed  of  two  windows  which  are  selected 
by  moving  the  mouse-controlled  cursor  into  the  desired  window. 
Training  of  the  digits  involved  repeating  the  phrase  or  word  immedi¬ 
ately  following  the  “PLEASE  SAY...  >“  prompt.  In  this  case,  a  base  set 
of  templates  existed  from  which  the  user’s  utterance  would  be 
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Initial  Digits  Training  Display 
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bootstrapped.  If  the  utterance  was  incorrect  (generally  a  user  word 
substitution  error),  a  “Forced  recognition  failure"  message  would 
result. 

If  the  utterance  was  recognized  but  significantly  different 
than  the  base  templates,  a  phrase  recognition  score  would  be  dis¬ 
played  with  the  message  “Forced  recognition"  (Figure  4.8).  At  this 
point,  the  user  could  either  select  the  “REPEAT  Forced  Recognition" 
option  and  try  the  phrase  again  or  the  “OK  Force  Recognition”  option, 
which  required  the  recognizer  to  accept  the  input  and  force  template 
adjustment.  The  degree  of  template  adjustment  is  controlled  through 
the  Engineering  Parameter  File.  Typically  during  the  enrollment  pro¬ 
cess,  the  template  might  be  adjusted  by  100  percent;  as  templates  are 
adjusted  during  subsequent  refinement  processes,  the  adjustment  fac¬ 
tor  might  be  reduced  to  10  percent. 

"Results:  Open  Recognition,”  shown  in  Figure  4.9,  meant  the 
user's  utterance  was  recognized  within  specified  parameters.  As  a 
result,  the  templates  would  automatically  undergo  adjustment  and  the 
next  phrase  would  be  presented. 

Approximately  three  to  five  minutes  were  required  to  com¬ 
plete  digit  training  for  most  individuals  we  trained.  Users  could  exer¬ 
cise  limited  control  over  the  system  during  this  phase  by  executing 
one  of  the  two-letter  commands  at  the  “CMD>"  prompt. 
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Following  initial  digit  training,  option  2  on  Figure  4.6  could 
be  selected  to  create  a  set  of  templates  for  the  application  vocabulary. 
In  this  application,  a  pre-loaded  set  of  vocabulary  templates  did  not 
exist.  Each  template  was  created  as  the  recognizer  proceeded 
through  a  first  pass  of  vocabulary  words.  During  this  phase  of  the 
enrollment  process,  speakers  had  to  say  each  vocabulary  word  exactly 
as  presented.  Once  enrolled,  the  vocabulary  words  would  again  be 
refined  through  ITT  carrier  phrases  (“SAY  airborne  AGAIN”)  and  in 
the  actual  syntax  (“CHECK  IN  FUEL  STATE  THREE  POINT  ONE”). 
During  this  phase,  the  identical  interface  shown  in  Figures  4.7  through 
4.9  was  active.  Enrollment  time  for  the  vocabulary  varied  widely 
between  individuals;  the  average  individual  required  approximately  45 
minutes. 

Although  not  used  for  the  test,  option  7  from  the  training 
menu  allowed  a  user  to  train  templates  by  bootstrapping  from  a  set  of 
previously  trained  templates.  While  this  could  reduce  training  time, 
the  option  was  not  used  as  the  training  method  so  that  we  could  obtain 
templates  without  any  possibility  of  previous  bias. 

2.  Practice  Recognition 

Option  4  from  Figure  4.6  allowed  the  user  to  practice  using 
the  vocabulary  and  the  syntax.  Following  selection  of  the  “Practice 
Recognizing"  option,  the  user  was  presented  with  a  screen  shown  in 
Figure  4.10.  When  the  microphone  was  open,  the  recognizer  would 
match  signals  against  the  vocabulary  according  to  the  syntax. 
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Following  recognition,  the  phrase  would  be  presented,  accompanied 
by  a  phrase  recognition  score.  If  any  words  were  not  within  the  pre¬ 
determined  threshold  they  were  marked  with  an  asterisk.  The  ses¬ 
sion  would  end  when  the  user  typed  a  “q"  to  quit. 

3.  Retraining  Templates 

If  specific  templates  were  yielding  inconsistent  results,  they 
could  be  retrained  by  exercising  option  6  from  the  main  training 
menu.  When  selected,  the  user  would  enter  the  word  number 
requiring  retraining.  After  recalibration,  the  word  would  be  presented 
in  two  different  phrases,  which  the  user  would  repeat  as  before. 

4.  Operating  the  Displays 

A  series  of  visual  displays  designed  to  replace  selected  status 
boards  was  developed  by  NOSC.  Input  to  the  displays  could  be  accom¬ 
plished  either  via  a  combination  of  voice  and  keyboard  entry  or  by 
keyboard  entry  only.  When  operating,  each  status  board  is  displayed  to 
a  designated  output  terminal.  The  four  displays  supported  are:  Air 
Operation  (Figure  4.11),  Departure  (Figure  4.12),  Marshal  (Figure 
4.13),  and  Approach  (Figure  4.14). 

The  Air  Operation  status  board  depicted  in  Figure  4. 1 1  would 
have  information  entered  via  keyboard  when  the  flight  was  anticipated. 
Included  would  be  the  pilot  name  and  mission  type.  This  data  is  not 
part  of  the  syntax  and  thus  would  not  be  entered  via  the  voice 
recognition  system.  As  the  flight  departed,  departure  information, 
along  with  appropriate  remarks,  would  automatically  update  the  board. 
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Figure  4.12  represents  the  Departure  status  board.  Again,  the 
“talker”  would  enter  the  syntax  supported  departure  information  via 
voice  (or  keyboard).  The  event  column  would  be  filled  in  with  the 
information  available  from  the  Air  Operations  status  board. 

Approaches  are  monitored  by  the  Approach  control  board. 
Information  that  is  monitored  by  this  board  would  be  used  to  auto¬ 
matically  update  the  Air  Operations  board.  A  prime  example  is  the 
aircraft  “state”  (or  fuel  status).  As  changes  to  the  state  are  reported  by 
the  aircrew,  it  would  be  visually  displayed  on  the  Air  Operations  dis¬ 
play  once  it  is  entered  by  the  Approach  “talker."  Again,  the  operator 
has  the  ability  to  update  his  status  board  either  via  voice  or  manual 
keyboard  entry. 

The  final  status  board  available  with  the  system  is  the  Marshal 
display.  Header  information  is  not  supported  by  the  vocabulary  and 
thus  would  be  updated  via  keyboard  entry. 

The  boards  are  maintained  via  the  “UPDATE..."  and 
"DELETE...”  phrases.  If  the  aircraft  is  deleted,  all  the  information  for 
that  side  number  is  removed  and  the  display  is  automatically 
refreshed.  Each  operator  maintains  his  own  status  board. 
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V. 


The  objective  of  this  experiment  was  to  evaluate  the  voice  recog¬ 
nition  accuracy  of  the  ITT  DCD  Voice  Recognizer/Synthesizer  model 
1280  VRS  under  four  experimental  conditions.  Specifically,  the 
experimenters’  primary  aims  included  evaluating  the  recognizer’s 
performance  under  quiet  (0  dBA)  and  noisy  (75  dBA)  environmental 
conditions  as  well  as  the  relationship  between  the  recognizer  perfor¬ 
mance  and  the  syntax  utilized. 

Two  secondary  objectives  of  the  training  and  testing  included  an 
informal  evaluation  of  the  system’s  user  interface  and  the  overall 
training  process.  No  particular  experimental  conditions  were  dedi¬ 
cated  toward  these  ends;  however,  user  surveys  and  extensive  experi¬ 
menters*  notes  on  the  approximately  200  laboratory  man-hours  were 
utilized  to  produce  recommendations  for  further  system  development 
and  training.  These  results,  while  principally  anecdotal  in  nature,  can 
at  a  minimum  serve  to  guide  final  system  designers  toward  the  most 
productive  designs  based  on  the  user  interface  and  other  human  fac¬ 
tors.  Within  this  section,  the  only  results  pertinent  to  these  secondary 
objectives  can  be  found  in  the  Questionnaire  Results  section.  Addi¬ 
tional  comments  regarding  the  overall  user-friendliness  of  the  system 
along  with  detailed  recommendations  on  training  have  been  deferred 
to  Chapter  VI  for  clarity. 
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A.  DESIGN 

A  treatment-by-treatment  by  subject  approach  was  utilized  to  test 
across  the  two  noise  levels  and  syntax  conditions.  A  graphical  repre¬ 
sentation  of  the  design  can  be  found  in  Figure  5.1.  The  subjects  were 
considered  a  random  factor  and  the  syntactic  and  noise  conditions 
were  fixed. 


Figure  5.1 

Experimental  Design 

At  this  point  during  experimentation,  no  attempt  was  made  to 
simulate  actual  CATCC  environmental  conditions,  control  or  otherwise. 


74 


T 


- 


beyond  the  use  of  selected  CATCC  phrases.  This  experiment  was 
designed  primarily  to  observe  the  relationship  between  noise  level  and 
recognition  accuracy  in  order  to  determine  possible  limitations  of  the 
recognizer  in  the  CATCC  and  to  test  the  recognizer’s  sensitivity  to  the 
syntactic  structure  used  for  CATCC  input. 

B.  SUBJECTS 

Twelve  volunteer  subjects  were  recruited  from  the  students  at  the 
Naval  Postgraduate  School.  Because  current  DOD  policy  does  not  per¬ 
mit  females  aboard  combat  vessels,  all  subjects  were  male.  Of  the 
twelve  subjects,  nine  were  naval  officers,  two  were  U.S.  Marines,  and 
one  subject  was  a  DOD  civilian.  Six  subjects  had  been  exposed  to  a 
continuous  automatic  speech  recognition  system  before  and  had 
between  one  and  five  hours  of  experience  combined  on  discrete  and 
continuous  ASR  systems.  Eight  of  the  subjects  had  direct  CATCC 
experience,  and  eleven  of  the  twelve  had  experience  with  the  vocabu¬ 
lary  through  flight  training/operations.  In  addition,  all  but  one  subject 
had  extensive  microphone  experience  in  CATCC  or  other  radio  opera¬ 
tions,  as  naval  aviators,  or  as  naval  flight  officers  (navigators).  Of  the 
twelve  subjects,  six  were  from  the  computer  systems  management 
curriculum  and  six  were  from  computer  science.  The  level  of  subject 
service  experience  was  reflected  in  ranks  ranging  from  0-3  to  0-4  in 
the  Navy  and  0-3  in  the  Marine  Corps.  The  civilian  holds  a  GS-12 
rating. 
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C.  APPARATUS  AND  MATERIALS 

A  Sun-3/  160M  workstation  with  an  ITT  DCD  model  1280  Voice 
Recognizer/Synthesizer  was  utilized  for  this  study.  The  complete 
details  of  the  system  architecture  can  be  found  in  Chapter  4,  but  it  is 
worth  noting  here  that  the  response  time  is  reported  to  average  .25 
seconds  with  a  vocabulary  capacity  of  approximately  2,000  words 
[Ref.  16]. 

The  Sun  workstation  and  ITT  ASR  board  were  augmented  with 
WYSE  WY-60  terminals  for  prompts  and  recognition  sets  as  well  as  a 
Shure  SM12A  microphone  as  an  input  device.  A  Hewlett-Packard 
model  465A  amplifier  was  used  between  the  microphone  and  ASR. 
The  microphone  was  later  changed  to  a  Plantronics  SNC  1436  noise¬ 
cancelling  microphone,  which  connected  directly  to  the  recognizer 
board,  allowing  removal  of  the  amplifier.  These  hardware  changes 
were  implemented  prior  to  final  testing  and  training  and  will  be 
explained  in  the  following  section  on  training. 

The  Sun  workstation  components  minus  the  computing  unit 
itself,  along  with  four  WYSE  terminals  and  the  microphone,  were  all 
located  in  a  7’  x  7'  controlled  Acoustical  Environments  chamber.  The 
chamber  is  a  nearly  soundproof  environment  with  internal  noise 
registering  0  dBA  when  external  noise  averages  60  dBA.  The  noise  for 
all  stages  was  thus  controlled,  with  noise  induced  through  experimen¬ 
tal  conditions  only. 

Specific  materials  used  in  the  conduct  of  the  experiment  included 
the  following: 
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1.  Graphical  illustrations  of  the  four  syntaxes  (i.e..  Approach, 
Departure,  Marshal,  and  Combined)  for  illustration  of  the  syn¬ 
taxes  to  the  subjects  (Appendix  B) 

2.  A  master  instruction  sheet  for  experimenters  to  insure  unifor¬ 
mity  in  testing  (Appendix  C) 

3.  A  test  subject  information  sheet  to  gather  basic  subject 
information  (e.g.,  name)  and  user  interface  and/or  training 
problems  or  recommendations  (Appendix  D) 

4.  A  training  verification  sheet  for  confirmation  of  subject  vocabu¬ 
lary  templates  (Appendix  E) 

5.  A  subject-by-condition  testing  matrix  (Appendix  F) 

6.  Pre-testing  instructions  (subject)  for  the  test  (Appendix  G) 

7.  Computer-loaded  test  files  for  each  syntax  (Appendix  H) 

8.  A  computer  file  of  CATCC  radio  calls  to  use  through  a  DECTalk 
voice  synthesizer  as  part  of  the  induced  noise  (Appendix  I) 

9.  Response  phrase  sample  file  (Appendix  J) 

10.  A  post-test  questionnaire  to  gather  relevant  subject  informa¬ 
tion/qualifications  and  the  subject’s  impressions  of  the  system's 
usefulness  (Appendix  K) 


D.  PROCEDURES 

i.  Introduction 

Before  the  conduct  of  the  training  or  experimental  sessions,  a 
15-minute  introduction  to  the  research  was  presented  in  a  graduate- 
level  course  at  the  Naval  Postgraduate  School.  During  this  introduc¬ 
tion,  the  students  were  told  the  purpose  of  the  research,  what  the 
experimental  design  was,  and  the  approximate  total  time  it  would  take 
to  participate  voluntarily.  This  was  followed  by  a  period  for  questions. 
It  is  worth  re-emphasizing  that  the  subjects  did  not  receive  monetary 
compensation  or  classroom  credit  for  their  participation  which,  as  a 
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result,  remained  strictly  voluntary.  Subjects  were  asked  to  sign  a 
roster  indicating  they  were  interested  in  participating  and  commit  to 
three  blocks  of  time,  to  include  at  least  one  two-hour  block  that  would 
not  impose  on  their  school  or  personal  schedules.  These  rosters  were 
collected  and  a  schedule  was  devised  for  the  training  and  testing  of  20 
subjects,  18  of  whose  original  time  requests  were  able  to  be 
accommodated. 

The  experimental  phase  was  originally  divided  into  two  ses¬ 
sions  for  each  subject—  training  and  testing.  Both  sessions  were  to  be 
conducted  in  the  Man/Machine  Systems  Design  Laboratory  at  the 
Naval  Postgraduate  School  inside  the  chamber  previously  discussed. 
All  20  volunteers  were  initially  trained  on  the  system  in  the  manner 
described  below,  but  numerous  recognizer  error  messages  and  soft¬ 
ware  bugs  precluded  the  continuance  of  the  testing  phase.  These  dif¬ 
ficulties  were  alleviated  by  telephonic  and  electronic  mail  consulta¬ 
tions  with  NOSC  designers /programmers  as  well  as  telephonic  and  on¬ 
site  consultations  with  ITT  technical  representatives.  The  specific 
nature  of  the  problems  and  solutions  will  be  discussed  in  Chapter  VI. 
As  a  result  of  the  time  lag  experienced  with  these  repairs,  the  number 
of  subjects  was  reduced  to  12  to  allow  completion  of  the  testing  within 
the  fixed  time  constraint  for  the  return  of  hardware  to  ITT  and  NOSC. 

2.  Training 

Prior  to  the  subject's  arrival  for  a  given  experimental  session, 
the  experimenters  would  ensure  that  all  equipment  and  forms  were 
present.  Appendix  C  was  used  to  remind  experimenters  of  various 
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training  and  testing  procedures  and  ensure  that  training  and  testing  of 
various  subjects  was  consistent  over  time.  A  test  subject  information 
sheet  (Appendix  D)  was  filled  out  with  the  subject’s  name  to  record 
the  time  required  for  training  and  testing  as  well  as  any  noteworthy 
difficulties  encountered  during  testing  or  training. 

Upon  arrival  at  the  Man/Machine  Systems  Design  Laboratory 
for  the  experimental  session,  each  subject  was  briefed  on  the  training 
and  testing  methodologies  and  specific  procedures  they  would  be  fol¬ 
lowing.  More  specifically,  the  experimenter  would  first  instruct  the 
subject  on  using  the  speech  recognition  system.  This  included  the 
following  precepts: 

•  Position  the  microphone  slightly  to  the  side  of  and  nearly  touch¬ 
ing  the  mouth. 

•  Keep  microphone  position  constant  during  training  and  testing. 

•  Speak  with  consistent  volume  and  speed. 

•  Speak  in  a  style  consistent  with  normal  speech.  Unusual  enuncia¬ 
tions  were  discouraged. 

Next,  the  experimenter  would  brief  the  subject  on  the  training  to  be 
conducted  by  introducing  him  to  the  graphical  illustrations  of  the  syn¬ 
taxes  of  the  words  he  would  encounter  as  well  as  discussing  the  order 
in  which  the  training  would  take  place  (i.e.,  digit  training  followed  by 
full  vocabulary).  The  subject  was  then  told  that  following  training  he 
would  be  asked  to  read  through  a  series  of  test  phrases  to  ensure  he 
had  good-quality  templates. 

Once  this  introduction  was  completed,  the  subject  would 
begin  training  the  vocabulary  words/phrases  on  the  Sun  workstation  as 
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prompted  by  the  system.  After  completion  of  the  training  passes,  the 
experimenter  would  place  the  system  into  an  open  practice  recogni¬ 
tion  mode  and  the  subject  was  asked  to  read  each  of  the  phrases  on 
the  training  verification  sheet  (Appendix  E)  three  times.  If  any  phrase 
was  not  completely  and  correctly  recognized  two  out  of  three  times, 
the  experimenter  would  trace  the  recognition  problem  and  retrain  the 
template(s)  for  the  word(s)  until  all  phrases  were  recognized  without 
error  two  out  of  three  times.  This  ensured  that  quality  templates  for 
the  various  utterances  were  developed  for  each  subject  and  allowed 
the  subjects  to  visually  see  open  recognition  of  their  trained 
vocabulary. 

3.  Testing 

Prior  to  explaining  the  testing  procedure  proper,  two  notes 
are  in  order  here.  First,  the  noise  condition  was  set  to  0  dBA  or  75 
dBA.  The  quiet  condition  was  chosen  in  an  effort  to  maximize  poten¬ 
tial  recognizer  performance.  The  loud  75  dBA  condition  was  chosen 
based  on  the  experimenters’  familiarity  with  CIC  environments  and  by 
actually  manipulating  the  noise  during  experimental  design  to  see 
what  sounded  loud  yet  would  still  be  tolerated  as  a  work  environment. 
Thus  this  choice  of  a  loudness  threshold,  while  somewhat  arbitrary, 
provides  a  basis  for  comparison  when  actual  measurements  of  the 
CATCC  noise  levels  can  be  taken.  Such  measurements  were  discussed 
but  proved  logistically  beyond  the  capabilities  of  this  research. 

The  second  note  to  be  made  here  relates  to  the  syntax  condi¬ 
tions.  The  two  conditions  are  labelled  “Combined"  and  “Separate.” 
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Within  the  CATCC  there  are  three  stations  which  handle  different 
types  of  statuses  for  the  various  aircraft.  These  are  termed  Approach, 
Departure,  and  Marshal.  Each  of  these  stations  has  its  own  syntax 
under  the  “Separate”  syntactic  condition;  if  an  approach  controller 
tried  to  use  syntactic  nodes  (i.e.,  words  and  phrases)  associated  with 
Departure  or  Marshal,  the  recognizer  theoretically  couldn’t  find  a 
response  phrase  match.  This  separation  limits  the  number  of  word 
paths  the  recognizer  must  choose  between  to  match  the  spoken 
phrase  to  a  response  phrase.  In  the  “Combined"  syntax,  on  the  other 
hand,  these  three  separate  syntaxes  are  joined  together  so  any  of  the 
personnel  maintaining  the  status  of  the  aircraft  could  use  any  of  the 
vocabulary.  For  this  experiment  specifically,  there  are  72  text 
phrases,  24  from  each  syntax  which  can  be  tested  through  the  syntax 
for  which  they  were  specifically  designed  or  through  a  combined  syn¬ 
tax.  Thus,  during  a  test  of  the  “Combined"  syntax,  a  subject  would 
speak  24  Approach,  24  Departure,  and  24  Marshal  phrases.  The  rec¬ 
ognizer  would  use  a  combined  syntax  in  looking  for  the  response 
phrases.  During  the  “Separate"  condition,  each  unique  syntax  would 
be  used  for  those  phrases  normally  used  by  the  specific  person  updat¬ 
ing  the  particular  status.  The  overall  question  in  the  regard  of  syntax 
then  is,  “Is  there  a  recognizer  performance  difference  if  the  syntaxes 
are  kept  separate  or  can  they  be  combined  with  no  performance 
degradation?" 

After  training,  the  subject  was  given  an  explanation  of  the 
various  trial  conditions  (Noise  vs.  Quiet  environment  and  Combined  vs. 
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Separate  syntaxes),  under  which  the  recognizer  would  be  tested. 
Appendix  F  is  the  subject-by-condition  testing  matrix  developed  to 
minimize  any  learning  or  proficiency  biases.  Subjects  were  told  the 
order  of  the  conditions  in  which  they  would  participate  and  then  given 
written  pre-testing  instructions  (Appendix  G)  to  ensure  they  knew 
what  they  would  need  to  do  to  facilitate  the  testing.  This  basically 
entailed  typing  in  the  name  of  the  test  file  of  test  phrases  and  reading 
them  with  the  appropriate  pauses  to  page  down  to  the  next  phrases  to 
be  read  when  necessary.  Most  subjects  reported  being  quite  comfort¬ 
able  with  this  after  doing  one  example  prior  to  the  beginning  of  test¬ 
ing.  Notably,  all  subjects  were  Computer  Technology  (i.e..  Information 
Systems)  students,  which  resulted  in  little  or  no  apprehension 
regarding  their  retrieval  of  the  test  files  because  this  function  is  virtu¬ 
ally  routine  in  their  studies. 

The  testing  was  then  started  with  the  subject  facing  one  of 
the  WYSE  terminals  with  one  screen  of  his  first  test  file  in  his  view. 
Test  files  for  each  syntax  can  be  found  in  Appendix  H.  These  files 
were  generated  at  random  with  the  exception  that  no  syntactic  path 
would  be  repeated  until  all  paths  were  sampled  at  least  once.  The 
experimenter  would  establish  the  noise  condition,  if  required,  by 
calling  the  computer  file  containing  simulated  CATCC  radio  calls 
(Appendix  I)  and  running  these  calls  through  the  voice  synthesizer. 
This  noise  was  augmented  by  “white  noise”  produced  by  a  standard 
portable  radio  tuned  between  broadcast  frequencies.  Noise  was  mea¬ 
sured  with  a  decibel  meter  prior  to  the  subject  beginning  calibration 


82 


of  the  recognizer  and,  utilizing  the  same  settings  each  time,  averaged 
75  dBA.  The  experimenter,  regardless  of  noise  condition,  started  a 
program  to  automatically  record  the  recognizer’s  response  phrases 
and  set  his  screen  to  receive  feedback  on  the  subject’s  utterances.  A 
sample  of  the  response  files  created  automatically  can  be  found  in 
Appendix  J.  The  subject  was  then  instructed  to  begin  reading  the 
phrases  as  per  the  instructions.  Subjects  thus  had  no  feedback  on 
recognizer  performance  utilizing  the  WYSE  terminal,  while  the 
experimenter  could  watch  the  response  phrases  appear  on  the  Sun 
workstation.  In  this  way,  the  experimenter  could  “coach”  the  subject 
if  he  was  speaking  too  rapidly  or  if  he  repeated  a  phrase  or  perhaps 
misspoke.  Subjects  were  asked  to  reread  any  phrases  they  misspoke, 
whether  discovered  by  the  experimenter  or  self-reported.  Each  sub¬ 
ject  read  through  the  various  test  phrases  twice  under  each  noise 
condition,  once  in  a  “separate"  syntax  and  once  in  a  “combined”  syn¬ 
tax.  Table  5. 1  illustrates  this  more  clearly. 

After  completing  each  condition,  the  subject  was  assisted 
with  retrieving  the  next  set  of  test  phrases  as  required,  the  automatic 
response  file  was  created  for  the  next  condition,  and  the  subject  began 
the  next  test  phase. 

After  completing  the  final  test  condition,  subjects  were  asked 
to  fill  out  a  survey  (Appendix  K)  designed  to  gather  subject  data  that 
might  be  pertinent  to  the  recognizer's  performance  as  well  as  the 
subject’s  impressions  of  the  “friendliness"  of  the  system  and  the 


TABLE  5.1 


TEST  CONDITION  PHRASES  AND  SYNTAXES 


Phrase 

Number 

Syntax 

Condition 

#  of 

Phrases 

Cumulative 
#  of  Phrases 

1-24 

Approach* 

OdBA 

24 

24 

25-48 

Departure* 

OdBA 

24 

48 

49-72 

Marshal* 

0  dBA 

24 

72 

1-72 

Combined 

0  dBA 

72 

144 

1-24 

Approach* 

75  dBA 

24 

168 

25-48 

Departure* 

75  dBA 

24 

172 

49-72 

Marshal* 

75  dBA 

24 

196 

1-72 

Combined 

75  dBA 

72 

288 

•These  three  separate  syntaxes  with  24  test  phrases  combined  to  make  up  the  separate 
syntax  condition.  The  phrase  numbers  (1-72)  In  the  separate  syntax  are  the  same 
phrases  (1-72)  In  the  combined  syntax. 


training  itself.  Subjects  were  then  debriefed  again  on  the  purpose  of 
the  system  being  tested  and  thanked  for  their  participation. 

Finally,  to  ensure  that  data  was  not  lost,  a  print-out  of  each 
subject’s  test  response  files  was  made  and  placed  in  a  folder  with  the 
subject’s  questionnaire  and  subject  information  sheet.  The  contents  of 
these  folders  were  then  held  until  scoring  and  results  analysis  began. 

E.  RESULTS 

1.  Dependent  Variable 

During  all  of  the  experimental  trials,  the  response  phrase  of 
the  recognizer  was  recorded  automatically  in  response  phrase 
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computer  files  like  the  example  contained  in  Appendix  J.  The 
“correctness”  of  this  response  phrase  as  compared  to  the  spoken 
phrase  was  the  dependent  variable  for  all  trials.  This  dependent  vari¬ 
able,  however,  was  scored  in  two  separate  ways  in  order  to  examine 
the  results  from  more  than  one  perspective. 

The  work  of  Rodman,  Joost,  and  Moody  [Ref.  18)  provided  a 
method  of  scoring  connected  speech  recognition  systems  utilizing 
reported  phrases  and  the  spoken  phrases.  This  method  provides  two 
scores  to  each  phrase  spoken  and  was  the  first  method  chosen  to 
evaluate  the  experimental  response  phrases.  The  first  score  in  this 
method  is  based  on  the  number  of  words  reported  correctly,  in  the 
correct  order,  divided  by  the  number  of  words  spoken.  The  latter  is  a 
calculation  of  the  number  of  words  reported  incorrectly  divided  by  the 
number  of  words  spoken.  This  scoring  method  was  utilized  because  of 
the  number  of  types  of  errors  that  can  occur  in  connected  speech 
recognition.  These  include  substitutions,  insertions,  deletions,  merge 
errors,  and  split  errors  as  well  as  preshadowing  and  postshadowing. 
Table  5.2  is  provided  (adapted  from  Rodman,  et  al.)  as  a  brief  intro¬ 
duction  to  these  types  of  errors.  Thus,  this  scoring  method  can  pro¬ 
vide  more  information  in  terms  of  the  types  of  errors  which  are  likely 
than  simply  recording  the  percentage  of  spoken  phrases  which  were 
recognized  without  error. 

The  second  scoring  method  utilized  was.  in  fact,  a  method 
originally  rejected  as  an  oversimplification  of  the  complex  task  of 
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TABLE  5.2 


TABLE  OF  COMMON  ERROR  TYPES 
OF  CONNECTED  RECOGNITION 

(adapted  from  Rodman,  et  al.,  1987) 


Simple  Substitution—  One  word  is  substituted  for  another. 

e.g.  Spoken:  I  can’t  fire  faster. 

Reported:  Tank  can’t  fire  faster.  score  *  <.75.  .25> 

Simple  Insertion— An  additional  word  is  inserted. 

e.g.  Spoken:  Coax  fire  on  target 

Reported:  Coax  fire  on  target  go.  score  =  <1.0,  .25> 

Simple  Deletion— A  word  is  left  out. 

e.g.  Spoken:  Coax  fire  on  target 

Reported:  Coax  fire  target.  score  =  <.75,  0.0> 

Merge— Two  or  more  words  are  recognized  as  one. 


e.g.  Spoken:  Move  tank  slower  right. 

Reported:  Any  slower  right.  score  =  <.5,  .25> 

Split—  One  or  more  words  are  recognized  as  two  or  more. 


e.g.  Spoken:  Can’t  go  faster. 

Reported:  Can’t  go  fast  gunner.  score  =  <.67,  .67> 


Preshadowing— A  word  resembling  one  of  the  syllables  at  the  begin¬ 
ning  of  a  correct  word  is  inserted  before  the  correct  word. 


e.g.  Spoken:  Move  tank  slower  right. 

Reported:  Move  any  tank  slower  right.  score  =  <1.0,  .25> 

Postshadowing— A  word  resembling  one  of  the  syllables  at  the  end  of  a 
correct  word  is  inserted  after  the  correct  word. 


e.g.  Spoken:  M-60  turn  rear. 

Reported:  M-60  cease  turn  rear.  score  =  <1.0,  .33> 
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measuring  the  recognizer’s  accuracy.  It  is  the  calculation  of  the  per¬ 
centage  of  response  phrases  which  are  equal  to  the  spoken  phrases 
without  error.  This  method  was  utilized  to  illustrate  the  raw  recogni¬ 
tion  rate  in  the  prototype’s  environment  where  there  was  no  method 
for  error  correction  and  where  any  required  correction  would 
necessitate  repetition  of  the  entire  phrase.  This,  it  is  suggested  by 
Pallet  [Ref.  19],  is  the  most  appropriate  method  for  an  environment 
where  this  sort  of  whole  phrase  repetition  is  required  for  correction. 

One  final  note  on  scoring  is  appropriate  here.  There  were  a 
few  occasions  during  the  sessions  where  the  subject  and  the  experi¬ 
menter  inadvertently  missed  speaking  a  phrase  for  one  reason  or 
another.  These  phrases  were  scored  <-1.  -1>  across  both  scoring 
methods  and  were  discarded  during  statistical  analysis. 

2.  Eclats  Vsing  Rpflmftn,  ct  ftlt  SCPffing 

Table  5.3  presents  the  analysis  of  variance  for  the  first  of  the 
Rodman,  et  al.  scores,  that  of  the  “number  of  words  reported  correctly 
(including  being  in  the  right  order)  divided  by  the  number  of  words 
spoken.”  [Ref.  18:p.  272)  That  is,  this  analysis  is  fundamentally  an 
analysis  of  the  percent  of  correct  words  recognized.  As  illustrated,  a 
significant  main  effect  of  syntax  was  discovered  (F  =  4.7996,  p  <  .06) 
with  no  other  main  effects  or  interactions  reaching  a  significant  level. 
The  overall  mean  score  achieved  by  dividing  the  number  of  correct 
words  recognized  by  the  number  spoken  was  .95958.  This  cam  be 
interpreted  as  indicating  that  nearly  96  percent  of  the  words  spoken 
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TABLE  5.3 


ANALYSIS  OF  VARIANCE  SUMMARY  TABLE  OF  THE 
NUMBER  OF  WORDS  REPORTED  CORRECTLY 
DIVIDED  BY  THE  NUMBER  OF  WORDS  SPOKEN 


SOURCE 

df 

SS 

MS 

F 

Noise  (N) 

1 

.0686 

.0686 

.4613 

Syntax  (S) 

1 

1.1932 

1.1932 

4.7996 

Subjects  (Su) 

11 

4.5894 

.4172 

Nx  S 

1 

.0602 

.0602 

.4343 

N  x  Su 

1 1 

1.6360 

.1487 

Sx  Su 

11 

2.7343 

.2486 

N  x  S  x  Su 

11 

1.5245 

.1380 

Error 

3387 

70.7902 

.0209 

TOTAL 

3434 

.96  SEPARATE  SYNTAX 

(.9785)  (.9779) 

.97 

96 

Number  of  Words 
Reported  Correctly 
Divided  by  Number 

of  Words  Spoken  %  (.9496) 

.94 

.93 

(.9323) 

75  dBA  0  dBA 

NOISE  CONDITION 


Figure  5.2 


are  recognized  correctly  in  the  correct  order.  The  mean  scores  for 
the  number  correct  divided  by  the  number  spoken  by  syntax  are 
shown  in  Figure  5.2. 

Table  5.4  presents  the  analysis  of  variance  for  the  second  of 
the  Rodman,  et  al.  scores,  that  of  the  “number  of  words  reported 
incorrectly  divided  by  the  number  of  words  spoken."  [Ref.  18,  p.  272]. 
This  analysis,  therefore,  is  fundamentally  an  analysis  of  the  percent  of 
incorrect  words  recognized.  In  some  cases,  however,  the  number  of 
words  reported  incorrectly  can  and  does  exceed  the  number  of  words 
spoken,  thereby  creating  a  value  greater  than  one.  Thus,  in  this  sense 
this  measure  is  not  a  strict  percentage.  As  shown,  a  significant  main 
effect  of  syntax  was  again  discovered  (F  =  5.1580,  p  <  .05)  with  no 
other  main  effects  or  interactions  reaching  a  significant  level.  The 
overall  mean  for  these  calculations  was  .03930.  Mean  scores  for  the 
numbers  of  words  reported  incorrectly  divided  by  the  number  spoken 
for  each  syntax  are  shown  in  Figure  5.3.  The  relatively  low  value  of 
this  score  indicates  that  the  errors  of  the  system  tested  tend  to  be 
primarily  deletion  or  substitution  errors.  This  was  found  true  by 
observation  alone  but  these  results  can  statistically  provide  the  basis 
for  recommendations  concerning  correction  schemes  which  will 
maintain  the  portion  of  the  phrase  that  is  correct  and  insert  or 
replace  for  the  deletion  or  substitution  as  appropriate  to  produce  the 
desired  output. 
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TABLE  5.4 


ANALYSIS  OF  VARIANCE  SUMMARY  TABLE  OF 
THE  NUMBER  OF  WORDS  REPORTED  INCORRECTLY 
DIVIDED  BY  THE  NUMBER  OF  WORDS  SPOKEN 


SOURCE 

df 

SS 

MS 

F 

Noise  (N) 

1 

.0596 

.0596 

.3174 

Syntax  (S) 

1 

1.5577 

1.5577 

5.1580 

Subjects  (Su) 

11 

5.2736 

.4794 

Nx  S 

1 

.1027 

.1027 

.7431 

N  x  Su 

11 

2.0660 

.1878 

Sx  Su 

11 

3.3218 

.3020 

N  x  S  x  Su 

11 

1.5197 

.1382 

Error 

3387 

91.6240 

.0270 

TOTAL  3434 


j07 


J06 


Number  of  Word* 
Reported  Incorrectly 
Divided  bv  Number 
of  Word*  Spoken 


(.0703) 


XB  SEPARATE  SYNTAX 

(J0193)  (J0167) 

76dBA  OdBA 


NOISE  CONDITION 


Figure  5.3 


Syntax  vs.  Noise  Incorrect  Results  Using  Rodman  et  aL  Scorini 


90 


Utilizing  the  second  scoring  method  by  simply  figuring  the 
percentage  of  phrases  recognized  with  and  without  error  created  a 
distribution  which  was  binomial  rather  than  normal.  Research  has 
shown  that  the  F  test  is  very  robust  and  can  give  an  indication  of  sig¬ 
nificance  despite  this  type  of  distribution  [Ref.  20).  An  analysis  of  vari¬ 
ance  was  therefore  conducted  and  yielded  the  same  significant  main 
effect  of  syntax  as  with  the  previously  mentioned  scoring  methods. 
The  results  indicated  an  F  value  of  5.3920  with  p  <  .05.  Also  similar  to 
the  other  scoring  method,  no  other  main  effects  or  interactions 
reached  a  significant  level.  The  overall  mean  for  correct  phrases  using 
this  straight  percentage  scoring  method  was  .90160,  while  the  mean 
incorrect  is,  of  course,  the  remaining  .09840.  These  scores,  while 
appearing  lower  in  terms  of  recognition  quality,  are  averaged  across  all 
four  experimental  conditions  and  ranged  between  87  percent  com¬ 
pletely  correct  recognition  in  the  noisy  environment  with  the  com¬ 
bined  syntax  to  roughly  93  percent  for  the  separate  syntaxes  under 
both  noise  conditions.  It  is  clear  that  the  separate  syntaxes  provide  a 
statistically  better  likelihood  of  completely  correct  phrase  recognition, 
as  illustrated  with  this  scoring  method,  and  more  completely  correct 
phrase  recognition  when  errors  do  exist,  as  shown  with  the  first 
scoring  method.  This  result,  combined  with  an  error  correction 
scheme,  may  present  a  design  modification  which  is  not  only  statisti¬ 
cally  significant  but  practically  significant.  This  notion  will  be  dis¬ 
cussed  further  in  the  following  chapter. 
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The  user  survey  conducted  was  targeted  specifically  to 
determine  the  pertinent  demographic  information  about  the  subjects 
(e.g.,  experience  level,  military  grade)  and  their  opinions  regarding 
the  training  and  the  user  interface  of  the  prototype  system.  Question¬ 
naire  results  indicated  that  4  of  the  12  subjects  had  no  CATCC 
experience,  2  had  been  exposed  to  the  environment  indicating 
experience  levels  of  5  and  20  hours,  and  6  of  the  12  officers  had  an 
average  experience  level  of  26.3  months  through  exposure  in  flight 
briefings  or  direct  assignment.  All  subjects  indicated  they  were  very 
comfortable  (9/12)  or  comfortable  (3/12)  with  the  vocabulary  used  in 
the  experiment.  In  addition.  11  of  12  and  1  of  12  responded  they 
were  very  comfortable  and  comfortable,  respectively,  with  using  a 
microphone. 

Figures  5.4  through  5.7  indicate  the  subjects’  responses  to 
the  training  itself  and  the  user  interface  the  system  provided  through 
the  hardware  discussed  earlier.  As  is  graphically  evident  in  Figure  5.4, 
all  subjects  found  the  training  “Quite  Easy"  at  the  very  least,  and  seven 
of  them  rated  it  the  highest  possible  "Very  Easy."  The  experimenters 
believe,  however,  that  there  is  something  of  a  subject/experimenter 
bias  with  the  normal  peer  relationship  existing  between  the  two.  That 
is,  subjects  may  have  felt  that  they  were  rating  the  quality  of  the 
experimenter  as  a  trainer  and  were  biased  by  their  normal  relation¬ 
ships.  The  intent  of  the  question  was  not  to  measure  this  but  rather  to 
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VERY  QUITE  FAIRLY  BORDERLINE  FAIRLY  QUITE  VERY 
EASY  DIFFICULT 


Figure  5.4 

The  Training  Session,  as  Guided  bv  the  Experimenter.  Was: 
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Figure  5.5 

he  Quality  of  the  Sun  Workstation  Display  Used  for  Training  I 
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SUBJECTS  4 
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2 
1 

EXCELLENT  GOOD  ONLV  FAIR  POOR  TERRIBLE 

Figure  5.6 

The  Quality  of  the  WYSP  Display  Used  for  TCStiBl 
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NUMBER  5 
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3 
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1 


VERY 

SATISFIED 


SATISFIED 


BORDK- 

LNE 


DIS-  VERYDIS-  NO 
SATISFIED  SATISFIED  RESPONSE 


Figure  5.7 


get  the  subject’s  reaction  to  any  delays  experienced  because  of  system 
hardware  or  software  errors.  These  sorts  of  delays  were  recorded  by 
the  experimenter  for  all  subjects  during  training  and  testing  and  will 
be  commented  upon  in  the  conclusions  and  recommendations  chap¬ 
ter.  Figures  5.5  through  5.7  indicated  subjects’  reactions  to  various 
components  of  the  interface  between  the  user  and  the  system.  Most 
of  the  system  will  change  prior  to  final  implementation,  as  is  the  case 
with  many  prototypes.  This  is  especially  true  of  the  visual  displays 
since  they  will  need  to  be  readable  from  certain  distances  in  the 
CATCC  and  therefore  will  need  to  be  designed  with  the  appropriate 
size,  illumination,  and/or  colors.  Subject  opinions  about  the  screens 
were  relatively  positive  as  per  Figures  5.5  and  5.6,  but  the  reactions  to 
the  microphone  were  the  most  varied.  This  points  to  a  particularly 
critical  design  consideration  because  the  microphone  utilized  was  one 
of  the  few  hardware  components  which  eventually  may  be  carried  over 
into  the  final  design. 

The  most  critical  questions  addressed  by  the  subjects  were 
the  final  four.  The  first  two  questions  were  to  elicit  whether  the 
subject  felt  that  voice  recognition  technology  was  appropriate  for  the 
CATCC  environment.  The  results  of  these  questions  can  be  found  in 
Figures  5.8  and  5.9.  The  totals  for  each  of  the  questions  are  somewhat 
misleading  depending  on  the  credibility  we  assign  to  those  without 
experience  or  exposure  to  the  CATCC  environment.  In  fact,  as 
illustrated  by  the  figures,  although  those  with  experience  or  exposure 
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Figure  5.8 

How  Acceptable  or  Unacceptable  Do  You  Feel  Voice  Input 
Technology  Is  for  the  CATCC  or  CIC  Environment? 


□  Experienced/exposed  subject 


WITHOUT  WITH  LITTLE  WITH  SOME  WITH  GREAT 
HESITATION 


NO 

RESPONSE 


Figure  5.9 

If  Yon  Were  Responsible  for  the  Operation  of  a  CATCC  or  CIC. 
How  Would  You  Accent  a  Fully  Developed  Voice  Incut  Status 
Board  System  to  Replace  the  Current  Methodology? 
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to  the  CATCC  find  voice  input  technology  generally  more  acceptable 
than  borderline  or  unacceptable  for  the  CATCC  or  CIC  environment, 
they  are  not  as  “positive"  as  are  their  less-experienced  peers.  The 
same  can  be  said  for  their  responses  to  whether  they  would  accept  a 
fully  developed  system  if  they  were  responsible  for  the  operation  of  a 
CATCC.  A  listing  of  the  responses  to  the  final  two  “open-ended” 
questions  can  be  found  in  Appendix  L.  Most  responses  center  on  the 
issues  of  reliability,  maintainability,  display  quality,  noise,  and  the 
trainability  of  the  system.  As  will  be  further  discussed  in  Chapter  VI, 
these  topics  may  become  weighty  considerations  for  final  design 
features. 
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VI.  EVALUATION.  RECOMMENDATIONS.  AND  CONCLUSIONS 

This  chapter  is  a  compilation  of  the  experimental  results  and  the 
recommendations  and  conclusions  which  logically  follow.  The  first 
section  is  an  evaluation  of  the  NOSC  prototype  system.  This  is  then 
followed  by  a  section  of  recommendations  to  final  system  designers. 
These  recommendations,  while  they  may  be  linked  to  objective 
experimental  results,  may  in  fact  be  based  on  the  results  of  user  sur¬ 
veys  (i.e.,  user  experience)  or  the  experimenters’  own  experience 
with  Jie  system.  The  recommendations  are  thus  intended  to  be  prag¬ 
matic  and  give  a  sense  of  what  design  elements  might  work  and  help 
eliminate  potential  problems  vice  those  that  are  strictly  proven  by 
laboratory  experimentation.  The  basis,  whether  experimental  or  oth¬ 
erwise.  will  be  noted  with  each  recommendation.  These 
recommendations  will  then  be  followed  by  general  conclusions. 

A.  EVALUATION 

1.  fiypml 

The  prototype  system  provided  for  the  evaluation  of  the  use  of 
speech  in  the  CATCC  environment  evidenced  at  least  one  major  flaw 
common  to  prototypes.  Pressman  points  out  that  prototyping  can  be 
problematic  as  a  model  for  software  engineering  because  “The  cus¬ 
tomer  sees  what  appears  to  be  a  working  version  of  the  software, 
unaware  that  the  prototype  is  held  together  ‘with  chewing  gum  and 
baling  wire,’  unaware  that  in  the  rush  to  get  it  working  we  haven’t 
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considered  overall  software  quality  or  long-term  maintainability."  [Ref. 
21:p.  23)  The  NOSC  prototype  was  typical  of  prototypes  in  this 
respect.  The  system  as  a  whole  was  functional,  but  when  experimenta¬ 
tion  began  a  number  of  dysfunctions  occurred  simply  because  the  pro¬ 
totype,  as  a  prototype,  was  not  robust  as  a  final  system  design  would 
have  been.  For  example,  numerous  recognizer  errors  were  encoun¬ 
tered.  These  errors,  specifically  error  numbers  9  and  11,  had  been 
virtually  unseen  during  NOSC  development,  but  with  the  approxi¬ 
mately  200  hours  of  training,  testing,  and  simply  “playing"  with  the 
system  these  errors  were  so  abundant  that  they  caused  a  week-long 
delay  in  final  experimentation  while  an  on-site  consultation  was  con¬ 
ducted  to  fix  the  problems.  Most  of  the  additional  difficulties  dis¬ 
cussed  below  are,  in  the  opinion  of  the  experimenters,  related  to  this 
prototyping  paradigm  of  development. 

This  is  not  to  excuse  these  system  flaws  per  se.  but  simply  to 
evaluate  them  as  a  part  of  the  environment  in  which  the  prototype  was 
developed  and  the  purpose  (i.e.,  to  evaluate  the  use  of  voice  recogni¬ 
tion  technology  in  a  CATCC)  for  which  it  was  developed. 

2.  Hardware 

At  least  one  major  hardware  problem  was  encountered  with 
the  NOSC  prototype.  The  delivered  system  utilized  a  Shure  SM12A 
microphone  connected  through  a  Hewlett-Packard  model  465A 
amplifier  to  the  ITT  automatic  speech  recognition  board.  A  trace  was 
attempted  to  isolate  the  source  of  numerous  recognizer  errors 
(averaging  four  to  five  per  one-hour  session),  indicating  lost 
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communication  with  the  recognizer.  Most  attempts  to  eliminate  other 
sources,  such  as  software  macro  programs,  were  unsuccessful,  but  an 
ITT  consultant  pointed  out  that  the  button  on  the  Shure  microphone 
was  not  providing  the  hardware  an  on /off  disconnect  the  recognizer 
board  could  identify.  ITT  provided  a  Plantronics  SNC  1436  noise-can¬ 
celling  microphone  which  supplied  the  connect/disconnect  signal  the 
recognizer  required,  which  reduced  the  “lost  communications  with 
recognizer"  messages  to  nearly  zero.  Notably,  one  other  source,  a 
software  source,  was  discovered  as  related  to  the  “lost  communica¬ 
tions  with  recognizer"  errors.  This  will  be  discussed  in  the  software 
section  below. 

Another  concern,  not  necessarily  a  problem  for  prototype 
testing,  is  the  long-term  maintainability  of  the  system  hardware.  Mili¬ 
tary  systems  are  typically  “ruggedized"  to  meet  the  unusually 
demanding  requirements  of  24-hour-per-day  operational  or  combat 
environments.  In  fact,  27  percent  of  the  user  comments  relating  to 
the  major  issues  with  regard  to  utilizing  voice  input  in  the  CATCC/CIC 
were  tied  to  system  maintenance,  reliability,  and  system  ruggedness 
(e.g.,  the  ability  to  operate  in  degraded  or  unusual  conditions).  The 
system  tested  as  a  prototype  appropriately  used  off-the-shelf  commer¬ 
cial  hardware.  This  hardware,  while  not  put  to  the  test  in  a  closed 
laboratory  environment,  may  have  its  ruggedness  challenged  with 
around-the-clock  use  in  an  operational  or  combat  environment. 

Hardware  performance,  other  than  the  microphone  difficulty, 
was  quite  positive.  Objective  experimental  results  put  raw  recognition 
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of  correct  words  at  nearly  98  percent,  with  incorrect  words  as  low  as 
1.67  percent  if  separate  syntaxes  for  the  different  stations  are  utilized. 
This  recognition  rate  is  commercially  competitive  with  automatic 
speech  recognition  hardware  and/or  software  and  could,  it  is  believed, 
depending  upon  how  it  is  configured  with  software,  prove  quite  effec¬ 
tive  for  the  system. 

3.  Software 

A  number  of  difficulties  were  encountered  with  regard  to  the 
software  utilized  in  the  NOSC  system.  The  first,  and  initially  the  most 
harmful,  trouble  was  a  synchronization  problem  between  the  macro 
programs  utilized  to  train  the  user’s  speech  templates  and  the  recog¬ 
nizer  itself.  The  macro  programs,  written  in  UNIX  Command  Script, 
were  essentially  designed  to  run  automatically  once  the  training  selec¬ 
tion  was  made  from  the  main  menu.  These  programs  would  be  auto¬ 
matically  invoked  at  specific  times  during  training.  While  the  pro¬ 
grams  were  loading  and  executing  (usually  less  than  a  few  seconds), 
the  user  would  not  have  a  prompt  to  speak  so  he  would  be  silent.  The 
recognizer,  on  the  other  hand,  would  be  “looking"  for  an  utterance. 
The  recognizer  would  eventually  "time  out"  prior  to  the  completion  of 
the  macro  execution,  and  by  the  time  the  user  received  his  on-screen 
training  prompt,  a  recognizer  error  would  also  be  present.  This  type 
of  software  difficulty  was  addressed  by  Pressman  as  another  problem 
with  a  prototyping  methodology. 

The  developer  often  makes  implementation  compromises  in  order 
to  get  a  prototype  working  quickly.  An  inappropriate  operating  sys¬ 
tem  or  programming  language  may  be  used  simply  because  it  is 
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available  and  known;  an  inefficient  algorithm  may  be  implemented 
simply  to  demonstrate  capability.  After  a  time,  the  developer  may 
become  familiar  with  these  choices  and  forget  the  reasons  why  they 
were  inappropriate.  The  less-than-ideal  choice  has  now  become  an 
integral  part  of  the  system.  (Ref.  21:p.  23) 

As  users  of  the  system,  it  is  unclear  whether  UNIX  Command  Script  as 

a  programming  language  is  the  optimal  language  for  the  system.  The 

primary  NOSC  developer  reports  it  was  used  based  on  his  own 

programming  background  and  familiarity.  Suffice  it  to  say  here  that  a 

full-scale  requirements  analysis  and  subsequent  design  will  be 

required  utilizing  the  refinements  discovered  by  working  with  the 

prototype.  This  lesson  can  be  extended  not  only  to  this  particular 

software  aspect  but  also  to  the  following  software  issues  and  the 

hardware  problems  previously  discussed. 

The  user  interface  provided  by  the  software  was  often  very 
problematic.  These  problems  fell  generally  into  two  categories:  (1) 
features  which  are  necessary  for  user-friendly  system  operation  which 
are  not  Implemented  in  the  software,  and  (2)  features  which  are  built 
into  the  software  which  are  in  some  way  limiting  to  the  user.  An 
example  of  features  which  are  not  offered  which  would  be  necessary  to 
make  the  system  user  friendly  would  be  a  volume  meter  so  the  user 
could  adjust  his  voice  volume  to  a  level  which  will  help  create  accurate 
templates.  This  approach  has  been  used  by  other  commercial  vendors 
(e.g.,  Votan).  Another  example  would  be  the  ability  to  enroll  single 
words  vice  the  entire  vocabulary.  The  current  software  requires  the 
user  to  enroll  the  entire  vocabulary  for  the  CATCC  at  one  time.  This 
means  that  if  the  user  makes  a  critical  mistake  enrolling  one  template 
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and  wants  to  start  “from  scratch"  for  that  word,  he  must  re-enroll  the 
entire  vocabulary.  This  type  of  re-enrollment  was  required  in  nearly 
40  percent  of  all  subject  training.  An  additional  concern  with  regard 
to  creating  and  adjusting  templates  was  the  sheer  length  of  the  train¬ 
ing  programs.  Again,  users  could  not  break  off  their  session  without 
having  to  repeat  what  they  had  already  trained,  so  any  sort  of  incre¬ 
mental  training  was  extremely  limited  by  the  software. 

Yet  another  feature  not  found  in  the  system  software  was 
clear  and  understandable  terms  for  attempting  to  move  within  it.  For 
example,  to  the  user  concerned  with  an  operational  environment, 
much  of  the  voice  recognition  style  language  (e.g.,  “Open  Recogni¬ 
tion")  could  be  transformed  into  more  familiar  terms  (e.g.,  “OK”). 
These  features,  while  minutiae  to  developers,  can  be  the  difference 
between  a  system  that  is  truly  geared  toward  the  user  and  subse¬ 
quently  used  by  him/her  and  a  system  that  is  developed  and  shelved 
because  users  consider  it  unfriendly  or  difficult. 

Current  limiting  factors  of  the  software  include  such  items  as 
having  to  repeat  an  entire  phrase  to  correct  a  single  error  and  the 
inability  to  abort  out  of  a  training  phase  if  one  template  is  particularly 
poor  without  going  through  all  well-trained  templates  upon  returning. 
The  first  limiting  factor  is  tied  directly  to  the  lack  of  a  feature—  that  of 
an  error-correcting  scheme.  Poock  and  Martin’s  research  shows  that 
error-correcting  schemes  have  the  potential  to  increase  the  efficiency 
of  an  automatic  speech  recognition  system  [Ref.  22),  and  the  lack  of 
such  a  scheme  in  this  particular  context  requires  the  user  to  repeat 
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the  whole  phrase.  For  example,  if  a  user  said  “UPDATE  10  5  PRO¬ 
FILE  TRAP"  and  the  recognized  phrase  was  “UPDATE  10  9  PROFILE 
TRAP,"  with  the  present  system  software  the  user  would  simply  have 
to  repeat  the  whole  phrase  to  get  it  correct  before  saying  the  word 
“SEND”  to  move  the  results  to  the  appropriate  CATCC  status  board. 
An  error-correction  scheme  would  allow  the  recognized  “9"  to  be 
changed  to  a  “5”  without  repeating  the  entire  phrase,  perhaps  with  an 
utterance  like  “CHANGE  9  to  5."  This  would  increase  user/ system 
flexibility  and  maximize  the  recognizer’s  potential  advantages  (e.g., 
speed).  Specific  recommendations  regarding  a  possible  error-correc¬ 
tion  scheme  will  be  detailed  in  the  recommendations  section  which 
follows. 

Aborting  out  of  training  and  its  subsequent  retraining 
requirement  is  due  to  the  types  of  recognition  which  are  programmed 
as  part  of  the  software.  The  recognizer’s  message  will  be  “OPEN 
RECOGNITION"  if  the  phrase  matches  the  template  within  the  set 
recognition-scoring  threshold.  The  message  will  be  “FORCED 
RECOGNITION”  if  it  falls  within  the  next  boundary  of  the  thresholds; 
practically,  this  means  the  phrase  was  considered  close  but  not  within 
the  bounds  for  open  recognition.  The  utterance  which  elicited  the 
“FORCED  RECOGNITION"  response  may  then  be  forced  into  the 
adjustment  of  the  templates  or  repeated,  depending  on  whether  the 
user  felt  he  uttered  the  phrase  accurately  or  inaccurately,  respectively. 
Finally,  the  user  may  get  a  message  “FORCED  RECOGNITION 
FAILURE."  In  most  cases,  this  message  means  one  of  two  things:  The 
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user  uttered  the  phrase  so  poorly  or  a  different  phrase  altogether  and 
the  recognizer  could  not  find  a  match,  in  which  case  repeal ng  the 
phrase  will  remedy  the  problem,  or  the  user  uttered  the  phrase  cor¬ 
rectly  and  the  template  is  poorly  trained,  thus  the  recognizer  does  not 
find  a  match.  In  this  latter  case,  the  user  is  trapped  by  the  system 
software.  He  can  delay  the  appearance  of  this  phrase  as  a  prompt  by 
choosing  “GO  TO  NEXT  PHRASE"  on  the  menu,  but  the  phrase  will 
reappear  at  a  later  time  and  eventually  cause  the  user  to  abort  out  of 
training  since  he  will  not  be  able  to  achieve  a  match  on  this  utterance. 
This  combines  with  a  previously  mentioned  feature  which  is  not  avail¬ 
able  on  the  menu,  that  is,  to  enroll  or  train  a  specific  word  or  phrase, 
to  make  enrolling  and  training  quite  inflexible. 

4.  Syntax 

Syntax  accounted  for  an  experimentally  significant  perfor¬ 
mance  difference  in  the  conduct  of  this  evaluation.  For  the  measure 
which  divides  the  number  of  words  reported  correctly  by  the  number 
of  words  spoken,  recognizer  performance  was  at  nearly  98  percent  for 
both  noisy  and  quiet  conditions  utilizing  separate  syntaxes.  The  com¬ 
bined  syntax,  however,  scored  at  near  95  percent  and  93  percent, 
respectively,  for  noisy  and  quiet  conditions.  Similar  results  were 
obtained  with  the  measure  which  divides  the  words  reported  incor¬ 
rectly  by  the  number  of  words  spoken,  combined/noise  (.05),  com¬ 
bined/quiet  (.07),  separate/noise  (.019),  and  separate /quiet  (.017). 
These  results  are  statistically  significant,  at  least  at  the  PL  .06  level, 
and,  it  is  anticipated,  would  be  practically  significant  for  the  CATCC 
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environment  because  of  the  need  for  accuracy  in  the  operational  envi¬ 
ronment  and  the  expected  volume  of  input  during  flight  operations. 
Although  an  on-site  evaluation  of  the  CATCC  requirements  proved 
logistically  impossible  during  the  conduct  of  this  research,  it  will  be 
important  for  the  final  design  effort  to  weigh  the  magnitude  of  input 
and  the  cost  of  errors  against  the  cost  of  implementing  the  separate 
systems. 

A  final  evaluation  comment  is  in  order  here  regarding  the 
syntaxes  utilized.  Many  of  the  subjects  tested  had  direct  CATCC  or 
flight  experience,  the  details  of  which  are  found  in  Chapter  V.  Nearly 
all  of  the  subjects  at  one  point  or  another  commented  about  the  inap¬ 
propriateness  of  some  aspect  of  the  syntax.  That  is,  subjects 
expressed  such  things  as  “You'd  never  say  that"  or  “There’s  no  such 
thing  as  ANGELS  90."  It  is  believed  that  this  is  related  again  to  the 
type  of  development  (i.e.,  prototype)  model  used  for  this  design,  again 
illustrating  Pressman’s  idea  of  the  developer  making  “implementation 
compromises  in  order  to  get  a  prototype  working  quickly."  [Ref.  21  :p. 
23]  These  concessions,  while  facilitating  rapid  development,  can  lead 
to  less-than-optimal  performance  in  a  final  system  by  providing  more 
branches  on  a  specific  node  than  are  actually  legitimate  real-world 
choices.  It  is  of  paramount  importance  that  these  syntactic  settle¬ 
ments  incorporated  into  the  prototype  model  not  be  overlooked  here 
or  forgotten  during  final  product  development.  A  careful  analysis  and 
design  of  the  actual  syntactic  rules  of  the  CATCC  operators  should 
preclude  errors  caused  by  unnecessary  nodal  branching. 
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5.  Other  User  Interface  and  General  Evaluation  Concerns 

The  current  training  interface  for  the  user  is  fixed.  He  pro¬ 
gresses  through  the  series  of  menu  choices  described  in  Chapter  IV 
and  subsequently  will  have  a  set  of  voice-recognition  templates  on  file 
for  his  use.  This  may  be  an  appropriate  training  methodology  for  this 
environment  and  technology  combination,  but  in  practice  the  experi¬ 
menters  found  a  combination  of  pre-training  with  the  vocabulary 
words/phrases  and  demonstration  proved  very  effective,  that  is,  it 
required  less  training  session  restarts.  Whether  this  effectiveness  is 
directly  related  to  the  training  method  is  unclear  because  of  the  lack 
of  flexibility  of  the  enrollment  and  training  of  templates.  For  example, 
if  we  could  start  and  stop  enrollment  at  any  location  or  just  go  back 
and  re-enroll  one  word,  would  we  need  to  pre-teach  or  demonstrate? 
Perhaps  not,  but  this  illustration  makes  clear  the  importance  of  care¬ 
fully  analyzing  the  training  method  to  be  utilized  with  the  final  system 
to  provide  a  link  between  the  new  user  and  the  system. 

Once  again  recalling  the  issue  of  the  use  of  the  prototype  as  a 
design  paradigm,  we  should  emphasize  the  need  for  human  factors 
requirements  analysis.  The  prototype  as  tested  received  generally 
high  marks  from  users  when  questioned  about  the  quality  or  ergo¬ 
nomics  of  the  work  station,  display,  or  microphones  utilized.  This 
could  be  anticipated  in  a  laboratory  environment  for  numerous  rea¬ 
sons,  including  the  following: 

1.  Subjects  that  are  not  actual  users  and  are  unaware  of  potential 
pitfalls  in  the  human/system  interface. 
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2.  Lack  of  an  operational  environment  to  provide  an  accurate  back¬ 
drop  for  the  system  operation. 

3.  Lack  of  realism  associated  with  laboratory  experimentation, 
especially  when  attempting  to  duplicate  complex  (e.g.,  shipboard) 
environments. 

This  is  problematic,  however,  for  attempting  to  generalize  to  the 
actual  operational  environment  of  the  CATCC  because  the  key  human 
interface  factors  identified  by  researchers  such  as  Monk  [Ref.  23]  are 
not  the  same  across  the  laboratory  and  operational  environments. 
These  factors  are  the  user  population,  the  user  task,  and  the  user 
environment.  The  user  population,  that  is,  sailors  from  an  aircraft 
carrier,  could  be  utilized  in  laboratory  experiments  even  though  this 
was  logistically  impractical  for  the  present  work.  This  would  narrow 
the  human  factors  consideration  to  the  user  task  and  user  environ¬ 
ment,  which  still  remain  formidable  human  factor  challenges.  Exam¬ 
ples  of  some  of  the  numerous  design  techniques  to  be  considered 
include: 

1.  What  types  of  control  devices  are  most  appropriate  for  the  task 
and  environment?  (e.g.,  mouse,  joystick,  foot-feed  to  control  the 
voice  input  on/off  switch) 

2.  What  types  of  software  display  devices  are  appropriate  to  user 
output?  (e.g..  If  the  user  needs  symbols,  how  should  they  be  dis¬ 
played?  How  large  should  they  be?  What  colors  should  the  dis¬ 
plays  use?  Are  there  any  domain-specific  colors /symbols  that 
should  be  included /avoided?) 

3.  What  types  of  software  control  should  be  available?  (e.g.,  should 
the  user  be  forced  through  menus  or  will  commands  be  available 
for  higher  performance?) 

4.  What  types  of  hardware  display  devices  should  be  utilized?  (e.g., 
raster  scan  displays,  liquid  crystal  displays,  plasma  panels,  print¬ 
ers,  or  even  voice  advisories  through  voice  synthesis) 
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The  detailed,  all-encompassing  scope  of  these  considerations,  while 
beyond  that  of  this  thesis,  cannot  be  underestimated.  Users  which 
range  from  new  trainees  to  those  with  hours  of  experience,  combined 
with  a  task  that  can  be  extremely  fast-moving  but  which  requires 
accuracy  in  an  environment  which  may  be  low  in  light  but  high  in 
stress,  noise,  and  concurrently  required  input  tasks,  create  a  nearly 
herculean  task  for  the  analyst  striving  to  optimize  the  user  system 
interface.  But  the  human  interface  requirements  analysis  and  subse¬ 
quent  input  for  overall  design  may  determine  whether  voice  input  can 
be  useful  in  the  CATCC  environment. 

One  final  evaluative  comment  is  in  order  here.  As  was  previ¬ 
ously  discussed,  a  number  of  weeks  were  spent  with  software  and 
minor  hardware  problems  and  finally  remedied  with  consultations 
between  the  experimenters,  NOSC  developers,  and  ITT  hardware 
experts.  These  types  of  difficulties  could  be  effectively  coped  with 
during  laboratory  work  because  of  its  static  nature.  These  same  sorts 
of  aggravation  would  render  the  entire  system  virtually  worthless  in 
the  operational  environment  of  a  CATCC.  Many  of  the  test  subjects 
experienced  the  errors  and  system  crashes  during  initial  training  and 
subjects  with  CATCC  experience  reflected  a  healthy  degree  of  skepti¬ 
cism  regarding  whether  voice  input  technology  was  appropriate  for 
the  CATCC  (Figure  5.8)  and  whether  they  themselves  would  accept  a 
fully  developed  voice  input  status  board  system  (Figure  5.9). 
Constructively,  then,  we  must  say  that  the  present  prototype  system  is 
not  ready  for  shipboard  presentation,  even  if  it  is  merely  used  as  the 
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requirements  analysis  tool  it  basically  is.  As  it  was  utilized,  there  were 
too  many  errors  still  within  the  system  for  it  to  be  used  as  an  effective 
method  of  helping  analyze  the  CATCC  requirements.  In  addition, 
there  are  considerations  with  regard  to  creating  a  “negative"  impres¬ 
sion  of  the  technology  in  an  environment  where  the  status  quo 
methodology  is  so  deeply  rooted  in  naval  carrier  tradition.  However, 
some  subset  of  the  prototype,  or  a  more  completely  developed  proto¬ 
type,  must  be  tested  aboard  ship  in  the  operational  environment  to 
meet  the  requirements  analysis  to  the  fullest.  The  current  prototype 
is  simply  not  ready.  Recommendations  concerning  this  type  of  testing 
are  contained  in  the  following  section. 

B.  RECOMMENDATIONS 
I.  Genera! 

Prototype  testing  requires  a  minimum  level  of  functionality 
prior  to  system  field  testing.  Any  system  which  exhibits  unpredictable 
and  anomalous  behavior  cannot  be  adequately  or  fairly  evaluated.  With 
that  concept,  we  are  separating  our  recommendations  into  two  dis¬ 
tinct  categories:  short-  and  long-term  recommendations.  Our  short¬ 
term  recommendations  are  those  deficiencies  that  must  be  solved 
prior  to  shipboard  testing  of  the  prototype.  Issues  or  problems  that 
must  be  considered  prior  to  full-scale  development,  but  which  are  not 
considered  essential  to  the  evaluation  of  the  prototype,  are  found  in 
our  long-term  recommendations. 
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2.  Hardware  Recommendations 

During  our  evaluation,  the  hardware  components  (processor, 
displays,  and  ITT  VRS  1280  recognizer)  all  performed  without  a  single 
hardware  failure.  However,  there  are  several  short-term  recommen¬ 
dations  regarding  implementation  of  the  current  suite  of  hardware. 
They  are: 

1.  Incorporate  manufacturer’s  recommended  microphone  system. 
The  microphone  system  was  our  initial  problem.  However,  using 
the  Plantronics  microphone,  recommended  by  ITT  representa¬ 
tives,  we  were  able  to  correctly  communicate  with  the  hardware. 

2.  Operate /evaluate  the  complete  prototype.  To  date,  the  system 
has  only  been  evaluated  in  a  scaled-down  version  of  the  full 
prototype  destined  for  the  ship.  We  strongly  recommend  that 
prior  to  field-testing,  all  three  recognizers  with  a  full  comple¬ 
ment  of  displays  and  input  devices  be  installed  and  fully  tested,  as 
originally  designed. 

3.  Acquire,  test,  and  implement  a  large  panel  display.  A  major 
component  of  the  system  will  be  the  displays  used  locally  in  the 
CATCC  and  those  used  as  remote  repeaters  throughout  the  ship. 
Prior  to  at-sea  prototype  testing,  we  recommend  implementation 
of  a  prototype  large  screen  flat-panel  display  legible  at  a  distance 
of  several  feet  in  low-lit  conditions.  Successful  demonstration 
and  validation  of  the  concept  Is  dependent  upon  successful 
Incorporation  of  at  least  a  prototype  flat  panel  display. 

4.  Develop  a  shipboard  cabling  and  power  distribution  plan.  The 
cramped  CATCC  spaces  require  development  of  a  detailed  cabling 
plan  prior  to  installation.  Development  of  such  a  plan  will  avoid 
on-site  wiring  problems.  The  plan  should  map  power  outlet 
sources  required  to  those  available,  and  the  specific  location  of 
cabling  runs. 

5.  Test  and  implement  a  remote  display.  One  of  the  major  advan¬ 
tages  of  the  system  is  the  ability  to  display  CATCC  information 
remotely,  thereby  eliminating  the  human  network  of  sailors.  We 
recommend  that  this  capability  be  fully  tested  and  implemented, 
using  flat  panel  display  technologies,  during  the  shipboard 
testing. 
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6.  Develop  a  hardware  performance  limitation  baseline.  In  the 
course  of  evaluating  this  prototype,  specific  performance  criteria 
should  be  developed,  tested,  and  documented.  For  example, 
what  are  the  expected  error  rates  with  respect  to  various  noise 
levels?  Above  what  level  of  noise  will  the  recognizer  fail  to  rec¬ 
ognize  speech?  Another  criterion  will  be  the  response  time 
under  a  variety  of  loading  conditions.  The  primary  concern  here 
is  to  determine  at  what  level  of  operation  the  components 
become  saturated  and  to  what  degree  the  performance  degrades. 

Long-term  recommendations  associated  with  the  hardware 
are  more  concerned  with  looking  beyond  the  prototype.  Of  primary 
concern  is  that  the  prototype  does  not  dictate  the  ultimate  hardware 
(make  and  model)  or  the  overall  architecture  to  be  employed.  What  is 
important  over  the  long  term  are  such  hardware  related  issues  as 
performance,  maintainability,  and  reliability  of  the  system.  Accord¬ 
ingly,  we  make  the  following  long-term  hardware  recommendations: 

1.  Consider  alternative  architectures.  There  are  serious  limitations 
associated  with  the  current  architecture.  The  prototype,  as 
implemented,  has  a  single  point  of  failure.  That  is,  if  the  Sun 
processor  is  no  longer  operative,  the  entire  system  is  rendered 
inoperative.  If  this  were  to  occur  while  deployed,  the  compo¬ 
nents  would  become  unwanted  baggage  in  the  cramped  CATCC 
spaces  until  the  system  could  be  repaired.  In  addition,  there  is 
no  storage  redundancy.  All  programs,  voice  templates,  and  sys¬ 
tems  software  is  stored  on  a  single  disk.  Disk  failure  caused  by 
vibration  or  dust  (not  unlikely  in  the  carrier  environment)  would 
result  in  complete  loss  of  data  and  voice  templates  (except  for 
information  archived  on  alternate  media).  A  distributed  netwoi  * 
architecture  based  on  stand-alone  personal  computers,  each 
equipped  with  a  recognizer  and  sufficient  storage  for  voice  tem¬ 
plates,  might  be  superior  to  the  single  processor  system  found  in 
the  current  prototype. 


2.  Consider  alternative  equipment.  The  prototype  system  has  sev¬ 
eral  immediate  disadvantages.  Component  size  (large  footprint), 
availability  of  maintenance  while  deployed,  and  lack  of  ruggediza- 
tion  are  all  long-term  issues  that  must  ultimately  be  addressed. 
Any  system  developed  for  Navy-wide  use  must  include  these 
issues  in  the  system  specification. 


3.  Optimize  the  input  Interface.  Some  combination  of  voice  input 
and  keyboard/pointing  device  will  optimize  the  man-machine 
interface.  Considerable  effort  should  be  devoted  to  identifying 
the  best  combination  of  input  modalities. 

3.  Software  Recommendations 

Unlike  the  hardware  components,  operation  of  the  software 
was  not  without  problems.  The  short-term  software  recommendations 
are  generally  deficiencies  that  must  be  corrected  prior  to  operation  by 
CATCC  personnel.  Our  long-term  recommendations  are  not  critical 
for  concept  demonstration  but  will  become  important  during  full-scale 
development.  Recognizing  the  pre-production  nature  of  the  ITT  rec¬ 
ognizer,  our  recommendations  will  not  distinguish  between  recognizer 
software  problems  and  those  problems  caused  by  software  developed 
by  NOSC.  Instead,  we  will  recognize  the  problems  in  a  generic  sense, 
leaving  resolution  to  some  combination  of  improved  ITT  and  NOSC 
software.  Our  short-term  recommendations  include  the  following: 

1.  Eliminate  unpredictable  operation.  Included  in  this  category  are 
the  recognizer  errors  previously  identified.  The  system  must  not 
be  installed  in  the  CATCC  without  resolution  of  the  various  recog¬ 
nizer  and  lost  communications  errors. 

2.  Improve  the  training  interface.  The  present  training  system  is 
inadequate  for  the  task.  The  inability  to  easily  retrain/re-enroll 
selected  words  is  considered  a  significant  deficiency.  The 
operator  should  be  allowed  to,  at  any  time,  retrain  or  re-enroll  a 
word  with  a  minimum  of  user  command  input.  In  addition,  the 
operator  should  be  allowed  to  discontinue  an  enrollment  session 
without  having  to  re-start  the  enrollment  process.  Finally,  the 
user  should  be  able  to  practice  enrolling  prior  to  actually  creating 
voice  templates.  We  recommend  that  the  enrollment  process  be 
simplified,  requiring  at  most  one  hour  to  create  a  basic  set  of 
templates. 

3.  Hide  the  operating  system  from  the  user.  The  user  should  not  be 
required  to  become  familiar  with  any  UNIX  operating  system 
commands.  File  maintenance  and  system  start-up /restart  and 
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backup  procedures  should  all  be  menu-driven  events.  It  must  not 
be  assumed  that  the  operators  are  computer  literate  or  that  they 
will  become  familiar  with  the  UNIX  operating  environment. 


4.  Incorporate  a  speaker  volume  meter.  Whether  this  is  accom¬ 
plished  via  software  or  some  temporary  hardware  solution  is 
unimportant.  The  primary  concern  is  that  users  learn  what 
speech  volume  is  necessary  to  train  and  use  the  system  correctly. 


">.  Improve  procedures,  for  starting  status  board  display  applicatic- 
The  current  series  of  commands  necessary  to  start  the  applica¬ 
tion  needs  to  be  simplified  to  a  single  menu  selection.  Requiring 
a  series  of  commands  to  be  entered  on  a  variety  of  terminals  is 
both  confusing  and  beyond  the  capability  of  most  novice  users. 


6.  Tune  the  recognizer  for  the  CATCC  environment.  The  engineer¬ 
ing  parameter  file  should  be  adjusted  for  this  particular  syntax 
and  environment. 


Our  long-term  recommendations,  while  not  considered  criti¬ 
cal  for  the  development  of  the  prototype,  are  nonetheless  issues  of 
major  concern  during  full-scale  development.  They  are  provided  as 
suggestions  for  future  endeavors. 


1.  Solicit  operator  input.  Individuals  (operators)  intimately  familiar 
with  the  environment  should  be  consulted  in  the  implementation 
of  any  of  the  software  component  interface. 

2.  Develop  the  interface  in  terms  familiar  to  the  operator.  Avoid  at 
all  costs  unfamiliar  terms  or  concepts  when  presenting  informa¬ 
tion  to  the  operator.  Eliminate  speech  technology  terms  such  as 
“FORCED  RECOGNITION  FAILURE”  or  “OPEN  RECOGNITION.” 

3.  Ensure  that  the  software  is  sailor  proof.  All  software  components 
must  protect  the  user  from  the  unpredictability  caused  by  incor¬ 
rect  or  unexpected  inputs  or  abnormal  execution. 

4.  SYBias  Recommendations 


Because  of  the  relative  importance  of  syntax  in  connected 
speech  systems,  we  are  making  the  following  recommendations. 
These  are  considered  both  short-  and  long-term  suggestions.  In 


general,  the  syntax  “operated"  correctly,  but  the  following  recom¬ 
mendations  are  offered  as  a  means  of  improving  the  application  and.  if 
possible,  should  be  incorporated  into  the  prototype  syntax: 

1.  Ensure  syntactic  correctness.  The  present  syntax  does  not  accu¬ 
rately  reflect  valid  phrases. 

2.  Allow  for  error  correction.  As  discussed  in  Chapter  II,  a  variety  of 
error-correction  schemes  should  be  incorporated. 

3.  Implement  task- specific  syntaxes.  Our  research  demonstrated 
that  smaller,  more  specific  syntaxes  performed  significantly 
better.  The  application  should  be  designed  such  that  a  unique 
syntax  is  available  for  each  of  the  displays. 

4.  Solicit  user  input  in  the  syntax  design  process.  A  system  for  ver¬ 
bally  communicating  status  board  information  presently  exists 
with  the  manual  system.  The  operators  should  be  involved  in 
developing  the  syntaxes  consistent  with  their  current  approach. 


C.  CONCLUSIONS 

Based  on  our  military  experience,  the  extensive  “hands  on"  expe¬ 
rience  with  the  prototype  system,  and  the  collective  opinions  of  our 
test  subjects,  we  have  developed  three  significant  conclusions. 

First,  we  believe  that  the  input,  display,  and  dissemination  of  air¬ 
craft  status  information  aboard  an  aircraft  carrier  is  a  process  which 
can  be  more  efficiently  and  effectively  accomplished  using  automation. 
We  are  not  alone  in  our  opinion;  other  carriers  are  already  using 
microcomputers  to  manage  and  display  CATCC  information  in  a  very 
similar  application  IRef.  241.  There  is  no  doubt  that  the  potential 
exists  to  dramatically  increase  the  accuracy  and  timeliness  of  this 
critical  information  throughout  the  ship. 
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Second,  voice  recognition  technologies  offer  an  input  mechanism 
which  appears  well-suited  to  the  CATCC  environment.  We  believe  that 
with  training,  proper  equipment,  and  well-designed  software,  a  voice- 
based  automated  display  system  could  be  effectively  implemented.  Our 
research  demonstrated  that  even  with  minimal  training,  and  despite 
significant  software  difficulties,  we  were  able  to  achieve  acceptable 
recognition  rates  in  a  noisy  environment. 

Finally,  if  the  short-term  recommendations  are  adopted,  the  pro¬ 
totype  can,  and  should,  be  tested  aboard  an  operational  aircraft  carrier 
as  a  means  of  validating  and  demonstrating  the  concept  outside  the 
protective  shelter  of  a  laboratory. 
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Rejection  templates 


APPENDIX  C 

MASTER  INSTRUCTION  SHEET 


1.  To  conduct  a  test,  first  set  up,  at  a  minimum,  the  first  four  tests. 
To  do  this,  open  a  window,  type  the  phrase,  and  then,  using  the  sun- 
tools  pull-down  menu,  “close"  the  window.  All  eight  of  the  following 
hostpump  commands  will  be  used.  The  only  change  is  to  substitute 
the  user’s  initials  for  “INIT”: 


hostpump 

hostpump 

hostpump 

hostpump 


INIT.asyn.q  approach.pump  approach.syn 
INIT.dsyn.q  departure. pump  departure.syn 
INIT.msyn.q  marshall. pump  marshall.syn 
INIT.csyn.q  combined.pump  combined. syn 


hostpump 

hostpump 

hostpump 

hostpump 


INTT.asyn.n  approach.pump  approach.syn 
INIT.dsyn.n  departure.pump  departure.syn 
INIT.msyn.n  marshall. pump  marshall.syn 
INIT.csyn.n  combined.pump  combined.syn 


2.  Now  train  the  user  as  usual  (using  “host").  At  the  completion  of 
the  training  and  prior  to  running  any  recognition,  you  MUST  go  to 
sdO/newtrain/NOSC6/TEMPLATE  and  execute  the  following: 

cp  subject_last_name/point.subject_initials 
subject_last_name  /p.  subj_init 

EX: 

cp  spegele /point. Js  spegele/p.js 

NOTE:  If  you  don’t  do  this,  you  will  get  an  error  message  when  load¬ 
ing  hostpump. 
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3.  Now  exit  host  and  set  up  to  conduct  the  test.  If  noise  is  required, 
activate  the  window  with  “input  trainer"  and  set  for  1  sec.  Also,  don’t 
forget  to  turn  on  the  radio  beside  you.  Take  a  noise  level  reading  and 
record  the  test  results. 

4.  Using  a  copy  of  the  file,  annotate  who  the  subject  is,  any  training 
difficulties  (problem  words,  etc.),  time  of  day,  and  any  substitution  or 
misspeak  errors. 

NOTES: 

Turn  Dectalk  off  during  quiet  tests 

Dectalk  setting  5  o'clock  for  75  dBA 

Subject  brief: 

mic  positioning 
speaking  rate  (speed) 
speaking  style  (normal) 
give  example 


APPENDIX  D 


TEST  SUBJECT  INFORMATION  | 


NAME: 

TIME  START: 
TRAINING  COMPLETE: 
TEST  START: 

TEST  COMPLETE: 
PROBLEMS: 
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APPENDIX  E 


TRAINING  VERIFICATION  SHEET 


UPDATE  0  0  0 
UPDATE  1  1  1 
UPDATE  2  2  2 
ADD  333 
ADD 444 
ADD  5  5  5 
DELETE  6  6  6 
DELETE  7  7  7 
DELETE  8  8  8 
DELETE  9  9  9 

CLEAR  BUTTON 

CLEAR  ANGELS 

CLEAR  DISTANCE 

CLEAR  RADAR  CONTACT 

CLEAR  FIRST  APPROACH  TIME 

CLEAR  SECOND  APPROACH  TIME 

CLEAR  CHECK  IN 

CLEAR  HOLDING 

CLEAR  COMMENCING 

CLEAR  REMARKS 

CLEAR  PROFILE 

CLEAR  MODE  REQUEST 

CLEAR  APPROACH  RECEIVED 

CLEAR  BINGO  STATE 

CLEAR  SEQUENCE 

CLEAR  AIRBORNE 

CLEAR  ARCING 

CLEAR  ON  TIME 

CLEAR  TIME  OFF 

CHECK  IN  FUEL  STATE  5  P  9 
CHECK  IN  FUEL  STATE  1  P  4 
CHECK  IN  FUEL  STATE  3  P  8 

PROFILE  TRAP 
PROFILE  BOLTER 
PROFILE  DOWNWIND 
PROFILE  INBOUND 
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PROFILE  TO  TANKER 
PROFILE  FOUL  DECK  WAVEOFF 
PROFILE  TECHNICAL  WAVEOFF 
PROFILE  AIRBORNE 

MODE  REQUEST  5  ALPHA 
MODE  REQUEST  3  ALPHA 
MODE  REQUEST  8  ALPHA 

APPROACH  RECEIVED  2  BRAVO 
APPROACH  RECEIVED  0  BRAVO 
APPROACH  RECEIVED  1  BRAVO 

SEQUENCE  4  ALPHA 
SEQUENCE  9  ALPHA 
SEQUENCE  6  ALPHA 

REMARKS  TACAN  DOWN 
REMARKS  INS  DOWN 
REMARKS  TRANSMITTER  DOWN 
REMARKS  RECEIVER  DOWN 
REMARKS  NORDO  DOWN 
REMARKS  ACLS  DOWN 
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APPENDIX  F 
TESTING  MATRIX 


SUBJECT 

SUBJECT  2 
SUBJECT  4 
SUBJECT  8 
SUBJECT  10 

SUBJECT  3 
SUBJECT  5 
SUBJECT  11 
SUBJECT  7 
SUBJECT  6 
SUBJECT  9 
SUBJECT  12 
SUBJECT  1 


separate/ 

noise 

1 

4 

3 
2 

1 

4 
1 
4 
3 
2 
2 
1 


CONDITION 

combined/  combined/ 


noise 

2 
1 
4 

3 

2 
1 

4 
3 
2 
1 

3 

4 


qniet 

3 
2 
1 

4 

3 
2 

3 
2 

1 

4 
4 
3 


separate/ 

quiet 

4 

3 
2 
1 

4 

3 
2 
1 

4 
3 
1 
2 
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APPENDIX  G 


SUBJECT  INSTRUCTIONS  «| 

1.  There  are  four  flies  which  we  will,  during  the  conduct  of  the  test, 
ask  you  to  call  up.  In  order  to  display  the  contests  of  a  file,  you  must 
type  “more  filename. extension"  where  the  filename  and  extension  are 
one  of  the  below  listed: 

approach.pump 
departure.pump 
marshall,  pump 
combined  .pump 

During  the  test,  you  may  have  to  scroll  through  the  file  to  display 
phrases  not  initially  shown.  To  do  this,  first  turn  off  the  microphone, 
then  hit  the  carriage  return  until  you  see  “END  OF  TEST."  Then 
turn  the  mike  on.  To  leave  the  file,  continue  depressing  the  carriage 
return  until  you  are  returned  to  the  UNIX  prompt 
“tamale=/usr.MC68020/sd0/stat/SCENARIO." 

2.  Phrases  read  from  the  test  file  should  be  read  in  the  same  manner 
as  you  practiced;  a  short  1-3  sec.  pause  is  sufficient  between  phrases. 

There  is  no  need  to  rush  the  reading  and  you  should  not  be  concerned 
with  exceeding  the  speed  of  the  voice  recognizer. 

3.  You  may  leave  the  microphone  open  [ONI  during  all  phases  of 
training  and  testing.  If  you  feel  a  need  to  momentarily  pause,  then  you 
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should  turn  the  microphone  off  until  ready  to  resume  voice 
recognition. 

4.  If  during  the  test  you  inadvertently  misspeak  and  realize  your 
error,  then: 

•  Turn  the  microphone  off; 

•  Alert  the  tester; 

•  Turn  the  microphone  on; 

•  Repeat  the  phrase  (correctly); 

•  Continue  the  test. 


APPENDIX  H 


TEST  FILES 


ADD  5  7  5  PROFILE  TRAP 
DELETE  9  1  4  PROFILE  BOLTER 
CLEAR  PROFILE 
BINGO  STATE  8  POINT  8 
CLEAR  SEQUENCE 
PROFILE  DOWNWIND 

FUEL  STATE  6  POINT  7 
CLEAR  APPROACH  RECEIVED 
UPDATE  0  6  0  PROFILE  TO  TANKER 
PROFILE  INBOUND 

DELETE  0  8  1  BINGO  STATE  7  POINT  6 
ON  TIME  5  5 

DELETE  7  I  5 

DELETE  3  3  3  FUEL  STATE  1  POINT  9 

CLEAR  FUEL  STATE 

ADD  2  4  9  FUEL  STATE  2  POINT  0 

CLEAR  ON  TIME 

FUEL  STATE  5  POINT  2 
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TV 


CLEAR  MODE  REQUEST 

UPDATE  5  5  4  PROFILE  FOUL  DECK  WAVEOFF 
ADD  4  8  2  PROFILE  TECHNICAL  WAVEOFF 
ON  TIME  5  4 
CLEAR  BINGO  STATE 
UPDATE  3  2  1  ON  TIME  2  3 

ADD  6  5  2  AIRBORNE 
REMARKS  TRANSMITTER  DOWN 
ADD  3  4  1  RADAR  CONTACT 
CLEAR  ARCING 
UPDATE  8  0  8  TIME  OFF  4  2 
CLEAR  BUTTON 

DELETE  7  8  9  REMARKS  INS  DOWN 
MOVE  16 

UPDATE  1  3  3  MOVE  5 
DELETE  2  2  7  ARCING 
REMARKS  NORDO  DOWN 
CLEAR  REMARKS 

ADD  4  9  6  REMARKS  ACLS  DOWN 
TIME  OFF  3  9 

REMARKS  RECEIVER  DOWN 

MOVE  1  5 

SEND 

CLEAR  RADAR  CONTACT 
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▼V 


UPDATE  9  1  8  BUTTON  11 
ADD  8  7  6  ARCING 
REMARKS  TACAN  DOWN 
AIRBORNE 

UPDATE  2  3  7  MOVE  20  7 
BUTTON  19 

CLEAR  DISTANCE 

UPDATE  4  7  8  ANGELS  3 

CLEAR  SECOND  APPROACH  TIME 

DELETE  1  0  7  CHECK  IN  FUEL  STATE  4  POINT  1 

CLEAR  HOLDING 

ADD  6  5  6  FIRST  APPROACH  TIME  3  0 
CLEAR  COMMENCING 

UPDATE  5  6  0  APPROACH  RECEIVED  9  ALPHA 

SECOND  APPROACH  TIME  4  5 

UPDATE  8  2  2  REMARKS  TACAN  DOWN 

ADD  9  9  4  COMMENCING  FUEL  STATE  3  POINT  6 

CLEAR  FIRST  APPROACH  TIME 

HOLDING  FUEL  STATE  9  POINT  5 
DISTANCE  2  8 
CLEAR  ANGELS 
ADD  7  0  0  HOLDING 
CLEAR  CHECK  IN 

ADD  6  6  9  HOLDING  FUEL  STATE  4  POINT  7 
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DELETE  0  0  0  SEQUENCE  0  BRAVO 

DELETE  19  3  MODE  REQUEST  7  BRAVO 

BINGO  STATE  3  POINT  2 

UPDATE  0  4  5  SEQUENCE  5  ALPHA 

DISTANCE  30  6 

CLEAR  TIME  OFF 


APPROACH  SYNTAX 

ADD  5  7  5  PROFILE  TRAP 
DELETE  9  1  4  PROFILE  BOLTER 
CLEAR  PROFILE 
BINGO  STATE  8  POINT  8 
CLEAR  SEQUENCE 
PROFILE  DOWNWIND 

FUEL  STATE  6  POINT  7 
CLEAR  APPROACH  RECEIVED 
UPDATE  0  6  0  PROFILE  TO  TANKER 
PROFILE  INBOUND 

DELETE  0  8  1  BINGO  STATE  7  POINT  6 
ON  TIME  5  5 

DELETE  7  1  5 

DELETE  3  3  3  FUEL  STATE  1  POINT  9 

CLEAR  FUEL  STATE 

ADD  2  4  9  FUEL  STATE  2  POINT  0 
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CLEAR  ON  TIME 
FUEL  STATE  5  POINT  2 

CLEAR  MODE  REQUEST 

UPDATE  5  5  4  PROFILE  FOUL  DECK  WAVEOFF 
ADD  4  8  2  PROFILE  TECHNICAL  WAVEOFF 
ON  TIME  5  4 
CLEAR  BINGO  STATE 
UPDATE  3  2  1  ON  TIME  2  3 

DEPARTURE  SYNTAX 


ADD  6  5  2  AIRBORNE 
REMARKS  TRANSMITTER  DOWN 
ADD  3  4  I  RADAR  CONTACT 
CLEAR  ARCING 
UPDATE  8  0  8  TIME  OFF  4  2 
CLEAR  BUTTON 

DELETE  7  8  9  REMARKS  INS  DOWN 
MOVE  16 

UPDATE  1  3  3  MOVE  5 
DELETE  2  2  7  ARCING 
REMARKS  NORDO  DOWN 
CLEAR  REMARKS 

ADD  4  9  6  REMARKS  ACLS  DOWN 
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TIME  OFF  3  9 

REMARKS  RECEIVER  DOWN 

MOVE  1  5 

SEND 

CLEAR  RADAR  CONTACT 

UPDATE  9  18  BUTTON  11 
ADD  8  7  6  ARCING 
REMARKS  TACAN  DOWN 
AIRBORNE 

UPDATE  2  3  7  MOVE  20  7 
BUTTON  19 


MARSHAL  SYNTAX 


CLEAR  DISTANCE 

UPDATE  4  7  8  ANGELS  3 

CLEAR  SECOND  APPROACH  TIME 

DELETE  10  7  CHECK  IN  FUEL  STATE  4  POINT  1 

CLEAR  HOLDING 

ADD  6  5  6  FIRST  APPROACH  TIME  3  0 
CLEAR  COMMENCING 

UPDATE  5  6  0  APPROACH  RECEIVED  9  ALPHA 

SECOND  APPROACH  TIME  4  5 

UPDATE  8  2  2  REMARKS  TACAN  DOWN 

ADD  9  9  4  COMMENCING  FUEL  STATE  3  POINT  6 
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CLEAR  FIRST  APPROACH  TIME 


HOLDING  FUEL  STATE  9  POINT  5 
DISTANCE  2  8 
‘  *  CLEAR  ANGELS 

l  ADD  7  0  0  HOLDING 

J  CLEAR  CHECK  IN 

k  ADD  6  6  9  HOLDING  FUEL  STATE  4  POINT  7 

I 

DELETE  0  0  0  SEQUENCE  0  BRAVO 
>  DELETE  1  9  3  MODE  REQUEST  7  BRAVO 

BINGO  STATE  3  POINT  2 

1  UPDATE  0  4  5  SEQUENCE  5  ALPHA 

DISTANCE  30  6 

1  CLEAR  TIME  OFF 
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APPENDIX  I 


CATCC  RADIO  CALLS 


1.  Tarhat  Marshal,  this  is  Redstone  one  zero  two  in  company  with 
one  zero  three  on  your  two  four  five  radial  at  forty  six  miles, 
angels  twenty  seven,  low  state  eight  point  three,  over. 

2.  Redstone  one  zero  two,  marshal.  This  will  be  a  case  three  recov¬ 
ery,  altimeter  two  niner  niner  two.  Redstone  one  zero  two,  mar¬ 
shal  two  five  zero  for  twenty  three,  angels  eight.  Expect  approach 
time  four  eight,  approach  button  one  six,  time  now  two  four  and 
one  quarter,  over. 

3.  Redstone  one  zero  two,  roger. 

4.  Redstone  one  zero  three,  marshal  and  two  five  zero  for  twenty 
four,  angels  niner,  expect  approach  time  four  niner,  approach 
button  one  eight,  time  now  two  four  and  one  half,  over. 

5.  Redstone  one  zero  three,  roger. 

6.  Marshal,  this  is  two  one  three  with  two  one  four  in  company,  on 
your  three  zero  five  for  thirty  three,  angels  twenty  three,  low  state 
eight  point  six,  requesting  mode  two’s. 

7.  City  Desk  two  one  three,  marshal,  case  three  recovery,  altimeter 
two  niner  niner  two,  marshal  two  five  zero  radial,  at  twenty  five, 
angels  ten,  expect  approach  button  one  six,  time  now  two  seven 
and  one  quarter,  over. 

8.  City  Desk  two  one  three,  roger. 

9.  City  Desk  two  one  four,  marshal  two  five  zero  radial  at  twenty  six, 
angels  eleven,  expect  approach  time  five  one,  approach  button 
one  eight,  time  now  three  zero  and  one  half,  over. 

10.  City  Desk  two  one  four,  roger. 

11.  Marshal.  Canasta  four  zero  zero  checking  in  with  play  mate  four 
zero  four  on  your  two  zero  zero  radial  at  thirty  one,  angels  twenty 
six,  low  state  six  point  two,  requesting  mode  one  alpha’s. 
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12.  Canasta  four  zero  zero.  Marshal,  case  three  recovery,  altimeter 
two  niner  niner  two.  Canasta  four  one  zero  marshall  two  five  zero, 
twenty  one.  angels  six,  expect  approach  time  four  six,  approach 
button  one  six,  time  now  three  one. 

13.  Canasta  four  zero  zero,  roger. 

14.  Canasta  four  zero  four,  marshal  two  five  zero  for  twenty  two, 
angels  seven,  expect  approach  time  four  seven,  approach  button 
one  eight,  time  now  three  one  and  one  half. 

15.  Canasta  four  zero  four  roger,  button  one  eight. 

20.  Ten  seconds. 

Five 

Four 

Three 

Two 

One 

Mark,  time  three  three. 

21.  Marshal,  Redstone  one  zero  two  in  holding,  angels  eight,  state 
seven  point  nine. 

22.  Redstone  one  zero  two,  roger,  angels  eight. 

23.  Marshal,  Redstone  one  zero  three  in  holding,  angels  niner,  state 
eight  point  zero. 

24.  Redstone  one  zero  three,  roger,  angels  niner. 

28.  Marshal,  Canasta  four  zero  zero  in  holding,  angels  six,  state  four 
point  one. 

29.  Canasta  four  zero  zero,  roger. 

35.  Marshal,  City  Desk  two  one  three  in  holding  angels  ten,  state 
eight  point  one. 

36.  City  Desk  two  one  three,  roger  say  mode  requested. 

37.  Mode  two. 

43.  Marshal,  City  Desk  two  one  four,  established,  angels  eleven,  state 
eight  point  zero,  request  mode  two. 

44.  City  Desk  two  one  four,  roger. 
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48.  Canasta  four  zero  four  established,  angels  seven,  state  four  point 
five. 

49.  Canasta  four  zero  four,  roger. 

53.  Ten  seconds  until  time  four  three. 

54.  Five 
Four 

Three 

Two 

One 

Mark,  time  four  three. 

59.  Marshal,  Canasta  four  zero  zero  commencing,  state  three  point 
four. 

60.  Canasta  four  zero  zero,  radar  contact  twenty  one  miles,  final 
bearing  zero  seven  zero. 

62.  Canasta  four  zero  zero,  platform. 

63.  Canasta  four  zero  zero,  go  button  one  ix. 

66.  Canasta  four  zero  four  commencing,  state  three  point  three. 

67.  Canasta  four  zero  four,  radar  contact  twenty  two  miles,  final  bear¬ 
ing  zero  seven  zero. 

68.  Canasta  four  zero  four,  platform. 

69.  Canasta  four  zero  four,  go  button  on  eight. 

70.  Redstone  one  zero  two  commencing,  state  six  point  four. 

71.  Redstone  one  zero  two  radar  contact  twenty  three  miles,  final 
bearing  zero  seven  zero. 

72.  Ninety  nine  Tarhat,  altimeter  two  niner  niner  five. 

73.  Redstone  one  zero  three  commencing,  state  six  point  zero. 

74.  Redstone  one  zero  three,  radar  contact  twenty  four  mils,  final 
bearing  zero  seven  zero. 

75.  Redstone  one  zero  two,  platform. 

76.  Roger. 
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77.  Redstone  one  zero  two.  go  button  one  six. 

78.  Redstone  one  zero  two.  switching. 

80.  Magic  six  zero  four,  roger. 

81.  Marshal,  City  Desk  two  one  three  commencing,  state  five  point 
six. 

82.  City  Desk  two  one  three,  radar  contact  twenty  five  miles,  final 
bearing  zero  seven  zero. 

83.  Redstone  one  zero  three,  platform. 

84.  Redstone  one  zero  three,  roger. 

85.  Redstone  one  zero  three,  go  button  one  eight. 

86.  One  zero  three,  switching. 

87.  City  Desk  two  one  three,  platform. 

88.  Roger. 

89.  Marshal,  City  Desk  two  one  four  commencing,  state  five  point  five. 

90.  City  Desk  two  one  four,  radar  contact  twenty  six  miles,  final  bear¬ 
ing  zero  seven  zero. 

91.  City  Desk  two  one  three,  go  button  one  six. 
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APPENDIX  J 


RESPONSE  PHRASE  SAMPLE  FILE 

NOTE:  Taken  from  Subject  1  Quiet  (0  dBA)  and  Separate  (Marshal) 
condition. 


WORD 

WORD  SCORE 

REJECTION 

CLEAR 

24 

18 

DISTANCE 

20 

18 

UPDATE 

15 

25 

4 

18 

23 

7 

35 

28 

8 

12 

25 

ANGELS 

23 

47 

30 

23 

21 

CLEAR 

28 

32 

SEC  O  ND_APPRO AC  H_TIM 

23 

23 

DELETE 

15 

32 

1 

21 

32 

0 

39 

44 

7 

22 

23 

CHECK  IN 

19 

28 

FUEL  STATE 

19 

33 

4 

19 

14 

*P 

60 

37 

1 

20 

14 

•CLEAR 

37 

16 

HOLDING 

22 

33 

ADD 

13 

41 

6 

13 

16 

5 

22 

19 

6 

14 

17 

FIRST  APPROACH  TIME 

19 

28 

3 

25 

21 

0 

37 

37 
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C'l  PAD 

COMMENCING 

UPDATE 

5 

6 
0 

APPROACH_RECEIVED 

5 

ALPHA 

SEC  OND_APPRO  AC  H_TIM 

4 

5 

UPDATE 

8 

2 

2 

REMARKS 

TACAN 

DOWN 

ADD 

9 

9 

4 

COMMENCING 

FUEL.STATE 

3 

P 

6 

CLEAR 

FIRST_APPROACH_TIME 

HOLDING 

FUEL_STATE 

9 

P 

5 

DISTANCE 

2 

9 


16 

21 

16 

20 

12 

23 

30 

32 

11 

1 1 

49 

45 

22 

26 

42 

37 

19 

32 

17 

23 

23 

21 

21 

20 

13 

19 

11 

25 

19 

21 

21 

31 

27 

23 

13 

39 

16 

27 

11 

43 

28 

25 

33 

2 

26 

23 

23 

24 

16 

26 

21 

20 

24 

24 

14 

18 

23 

32 

15 

23 

26 

36 

20 

29 

36 

31 

26 

23 

22 

30 

24 

26 

21 

24 

27 

27 
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CLEAR 

30 

24 

ANGELS 

31 

41 

ADD 

11 

40 

7 

20 

20 

0 

37 

42 

0 

47 

37 

HOLDING 

21 

27 

CLEAR 

22 

27 

CHECK  IN 

17 

27 

COMMENCING 

13 

12 

ADD 

9 

43 

6 

13 

15 

6 

13 

14 

34 

30 

HOLDING 

22 

37 

FUEL  STATE 

17 

29 

4 

16 

13 

*P 

40 

22 

7 

22 

28 

DELETE 

12 

32 

0 

34 

41 

0 

39 

47 

0 

43 

44 

SEQUENCE 

25 

22 

0 

37 

26 

BRAVO 

17 

25 

DELETE 

17 

34 

1 

23 

25 

9 

49 

47 

3 

30 

28 

MODE  REQUEST 

20 

28 

7 

24 

29 

BRAVO 

18 

27 

BINGO  STATE 

13 

24 

3 

20 

23 

p 

23 

21 

2 

19 

32 

UPDATE 

13 

23 

*0 

35 

18 

4 

24 

13 

5 

19 

21 
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SEQUENCE 

26 

23 

5 

31 

25 

ALPHA 

20 

27 

DISTANCE 

18 

22 

30 

18 

21 

6 

16 

17 

CLEAR 

23 

24 

TIME.OFF 

13 

32 

SEQUENCE 

14 

13 

SEQUENCE 

12 

12 
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APPENDIX  K 


CATCC  VOICE  RECOGNITION  POST-TEST  QUESTIONNAIRE 

1.  What  is  your  curriculum?  # _  Descriptor  _ 

2.  To  which  service  do  you  belong?  _  (i.e.,  USN,  etc.) 

3.  What  is  your  grade?  _ (i.e.,  0-2,  0-5,  etc.) 

4.  Are  you  a  Naval  Aviator?  Yes _  No _ 

5.  Are  you  a  Naval  Flight  Officer?  Yes _  No _ 

6.  Have  you  any  previous  experience  with  voice  recognition  systems? 

If  yes,  how  may  hours  (approx.)?  _ .  Mark  “0”  if  no 

experience. 

7.  Based  on  your  previous  training  and  work  experience,  how 
comfortable  or  uncomfortable  were  you  with  the  vocabulary  used 
in  this  experiment? 

_  Very  comfortable 

_  Comfortable 

_  Borderline 

_  Uncomfortable 

_  Very  uncomfortable 

8.  Based  on  your  previous  training  and  work  experience,  how 
comfortable  or  uncomfortable  are  you  utilizing  a  microphone? 

_  Very  comfortable 

_  Comfortable 

_  Borderline 
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_  Uncomfortable 

_  Very  uncomfortable 

9.  Have  you  ever  been  assigned  to  a  CATCC  or  CIC?  Yes _  No _ 

If  yes,  how  many  months?  _ (mos.) 

If  no,  have  you  ever  been  exposed  to  CATCC/CIC  operations? 
Yes  _  No _ 

If  yes,  now  long?  _ hours,  weeks,  months  (circle  one). 

10.  The  training  session,  as  guided  by  the  experimenter,  was: 

_  Very  easy 

_  Quite  easy 

_  Fairly  easy 

_  Borderline 

_  Fairly  difficult 

_  Quite  difficult 

_  Very  difficult 

11.  The  quality  of  the  Sun  workstation  display  used  for  training  was: 
_  Excellent 

_  Good 

_  Only  fair 

_  Poor 

_  Terrible 

12.  The  quality  of  the  WYSE  display  used  for  testing  was: 

_  Excellent 

_  Good 


Terrible 


13.  How  satisfied  were  you  with  the  ergonomics  of  the  microphone 
set? 

_  Veiy  satisfied 

_  Satisfied 

_  Borderline 

_  Dissatisfied 

_  Very  dissatisfied 

14.  How  acceptable  or  unacceptable  do  you  feel  voice  input  technology 
is  for  the  CATCC  or  CIC  environment? 

_  Completely  acceptable 

_  Reasonably  acceptable 

_  Borderline 

_  Moderately  unacceptable 

_  Extremely  unacceptable 

15.  If  you  were  responsible  for  the  operation  of  a  CATCC  or  CIC,  how 
would  you  accept  a  fully  developed  voice  input  status  board  system 
to  replace  the  current  methodology? 

_  Without  hesitation 

_  With  little  hesitation 

_  With  some  hesitation 

_  With  great  hesitation 

16.  What  do  you  feel  are  the  major  issues  (pro  and/or  con)  with 
regard  to  utilizing  voice  input  in  the  CATCC /CIC? 


17.  What  other  areas,  if  any.  in  the  Armed  Services  do  you  see  where 
voice  input  could  be  used? 
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APPENDIX  L 


OPEN-ENDED  QUESTION  RESULTS 

FROM  POST-TEST  QUESTIONNAIRE 

What  do  you  feel  are  the  major  issues  (pro  and/or  con)  with 
regard  to  utilizing  voice  input  in  the  CATCC/CIC? 

Reliability.  Keeping  the  tiling  up. 

Quality  of  displays. 

Noise  susceptibility. 

Training  and  turnover  of  various  personnel  to  system. 

Rapid  replacement  of  personnel  at  a  station  during  battle.  “Killed- 
now  replace  in  midst  of  battle  situation." 

Training  of  users—  microphone  fear. 

Control  of  environmental  noises  that  are  quite  prevalent. 

Training 

System  maintenance. 

Operation  in  degraded  or  unusual  conditions. 

Ability  to  revert  to  manual  system  over  long  term  (lost  skills). 

Noise  level  is  much  higher  in  a  carrier  than  it  was  in  the  booth. 
Standardizing  key  words  and  phrases  may  be  difficult. 

Making  system  reliable  (error  rate  low). 

Making  system  sailor-proof  (rugged). 

Educating  Navy  to  benefits. 

Reliability. 
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Pro- more  readable  and  faster  update  of  information  on  (status) 
boards,  possible  space  savings. 

Faster,  more  accurate  data. 

Stress. 

Background  noise  interference. 

Overlapping  duty  sections  (changing  over  of  personnel). 

Fatigue. 

Pro—  Free  person  from  writing  status  on  board;  faster  than  writing. 

Con— Interpreted  incorrectly:  able  to  respond  in  varying  noise 
environments. 

Reliability. 

Ease  of  training. 

Effect  of  flight-op  noise. 

Back-up  when  it  fails. 

Distinction  in  voices  due  to  colds. 

What  other  areas.  If  any,  in  the  Armed  Services  do  you  see  where 
voice  input  could  be  used? 

Cockpits  of  all  types  of  A/C  (aircraft). 

Rapid  strike  coordination  messages,  surface  to  subsurface. 

ASWMOD  (coordination  of  antisubmarine  warfare  assets). 

CIC  (Combat  Information  Center) 

Aircraft. 

NTDS  (Navy  Tactical  Display  Systems) 

Onboard  aircraft  (routine  duties). 

Input  to  flight  navigation  systems. 
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Message  preparation. 

Briefs,  presentations. 

Other  types  of  status  board  maintenance. 

Command  and  control  for  unmanned  vehicles. 

Testing. 

Quick  display  information  updates. 

Anywhere  status  updates,  etc.  are  manually  recorded  and  consist  of  a 
finite  set  of  words. 

Software  development— input  can  be  much  faster  with  voice  recogni¬ 
tion  than  by  keyboard. 

Security  checkpoints  (possibly). 

Aircraft—  to  ease  button  smashing  mode. 

HUD  (Heads  Up  Display)  interface  for  coming  aboard  the  ship,  e.g., 
“SAY  ALTITUDE"  without  leaving  the  meatball  (the  marker  for  landing 
successfully  aboard  the  aircraft  carrier). 


P 
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